BCS301 Notes AJIET 241009 121415
BCS301 Notes AJIET 241009 121415
Prepared by
Email : shanthakk99@gmail.com
Table of Contents
1 Probability Distributions
1.1
1.2
1.3
1.4
1.5
1.6
IET
Review of basic probability theory : . . . . . . . . . . . . . . . . . . . . . . . . . .
Random Variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
10
1
Lecture Notes - BCS301 : Mathematics for Computer Science - Module 1 : Probability Distributions Page 2
2.5 Vector: . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
T
2.11 Markov chain . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
3.9 Test for the Mean Number of Successes in Normal Approximation to Binomial Dis-
tribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127
IET
4.5 Test of significance of difference between sample means(small samples) . . . . . . . 161
5.4 Steps involved in Two- way ANOVA(when repeated values are not there): . . . . . . 192
5.5 Two-way ANOVA technique when repeated values are there: . . . . . . . . . . . . . 195
IET
Probability Distributions
Syllabus : Review of basic probability theory, Random variables (discrete and con-
tinuous), probability mass and density functions, Mathematical expectation, mean
and variance, Binomial, Poisson and normal distributions- problems (derivations for
mean and standard deviation for Binomial and Poisson distributions only)-Illustrative
examples. exponential distribution.
• Sample Space : The sample space S is the collection of all possible outcomes
of a random experiment. The elements of S are called sample points. A sample
space is called discrete if it is a finite or a countably infinite set. An uncountable
sample space is called a continuous sample space.
4
Lecture Notes - BCS301 : Mathematics for Computer Science - Module 1 : Probability Distributions Page 5
IET
Example 2 : Consider the experiment of tossing a coin twice.
Then S = {HH, HT, T H, T T }. For a small sample space like this, one can
readily list all of the possible events. In this case, there are 16 possible events. Some
of them are A = {HH}, B = {HT }, C = {T H}, D = {T T }, E =
{HH, HT }, F = {HH, T H} and so on.
Example 3 : Tossing a fair coin thrice Consider the experiment of flipping a coin
three times and recording the possible outcomes of the three flips. In this case, the
sample space is S = {T T T, T T H, T HT, HT T, T HH, HT H, HHT, HHH}
Here {HHH}, {HHT, T HH, T T T }, {HHH, T T T } etc. are some of the
events.
AJ
Example 4 : Throwing a fair die The possible 6 outcomes are:
The associated finite sample space is S = {1, 2, 3, 4, 5, 6}. Some events are , A =
The event of getting an odd face = {1, 3, 5} B = The event of getting a six = {6},
and so on.
• Equally Likely Events (Equiprobable events) : Two or more events are equally
likely if they have equal chance of occurrence. That is, equally likely events are
such that none of them has greater chance of occurrence than the others.
Example 1. While tossing a fair coin, the outcomes ’Head’ and ’Tail’ are
equally likely.
IET
Example 2. While throwing a fair die, the events A = {2, 4, 6}, B =
{1, 3, 5}&C = {1, 2, 3} are equally likely.
• Mutually Exclusive events (Disjoint events) : Two or more events are mutu-
ally exclusive if only one of them can occur at a time. That is, the occurrence
of any of these events totally excludes the occurrence of the other events. Mu-
tually exclusive events cannot occur together.
Example 1. While tossing a coin, the outcomes ’Head’ and ’Tail’ are mutually
exclusive because when the coin is tossed once, the result cannot be Head as
well as Tail.
• Axioms of Probability :
Addition Theorem : For any sets A and B (not necessarily mutually exclusive),
P (A ∪ B) = P (A) + P (B) − P (A ∩ B)
IET
The addition theorem of probability for three events is given by
P (A ∪ B ∪ C) = P (A) + P (B) + P (C) − P (A ∩ B)
− P (A ∩ C) − P (B ∩ C) + P (A ∩ B ∩ C)
The probability of an event A occurring when it is known that some event B has
occurred, is called a conditional probability of the event given that B has occurred
and denoted by P (A|B ).
P (A|B), is defined by
P (A ∩ B)
P (A|B) = if P (B) > 0
P (B)
A random variable is a function that associates a real number with each element in
the sample space. In other words it is a function from the sample space S to the set
of all real numbers; denoted as X : S 7→ R
IET
Example 1: While tossing a coin, suppose that the value 1 is associated for the out-
come ‘head’ and 0 for the outcome ‘tail’. In other words, let X= no. of heads.
We have the sample space S = {H, T } and if X is the random variable then
X(H) = 1 & X(T ) = 0.
Hence range of X = {0, 1}
Example 2: For example, consider the random experiment of tossing three fair coins
up.
Then S = {HHH, HHT, HTH, THH, TTH, THT, HTT, TTT}.
Define X as the number of heads that appear.
Hence, X (HHH) =3 , X (HHT) =2, X (HTH) =2, X (THH) =2 , X (HTT) =1, X(THT)
=1 , X (TTH) =1 and X(TTT) =0.
AJ
Here the image set of the random variable may be written as X = {0, 1, 2, 3} .
Exampe : Other Examples for discrete random variable are the size of a family, No.
of accidents on road, No. of customers in a bank, No. of cars manufactured by the
company, The no. of transactions in a bank per day etc
IET
A random variable ’X’ which takes all possible values in a given interval is called a
continuous random variable.
Example : If X takes all values in a interval [0,1], then we can not list the values
taken by X. Hence X is a continuous random variable.
Other examples are, The temperature at a location, The life time of electronic compo-
nent, The reaction temperature at chemical laboratory, Current in a semi-conductor
diode etc.
Note :
Remember that discrete random variables can take only a countable number of pos-
sible values. On the other hand, a continuous random variable X has a range in the
AJ
form of an interval.
Let X be a discrete random variable defined on the Sample space S, and let the val-
ues of X are x1 , x2 , x3 , · · · , xn . The function p(xi ) = P (X = xi ), that assigns
a probability to each possible value of the random variable is said to be a prob-
ability mass function(pmf)(or probability distribution) if it satisfies the following
properties.
(b) Normality: Σ∞
i=1 P (xi ) = 1 for all xi ∈ X
Note : The probability that a random variable X takes a value in the closed interval
[a, b] is given by
X
P (a ≤ X ≤ b) = p(x)
a≤x≤b
Note : Sometimes probability mass function is also called as probability density
IET
function
X=Number of heads 0 1 2 3
Number of occurrence 1 3 3 1
1 3 3 1
P(x) 8 8 8 8
Note : The probability that a random variable X takes a value in the (open or
closed) interval [a, b] is given by the integral of a function. i.e.
Z b
P (a ≤ X ≤ b) = f (x)dx
a
as x̄ or µ.
E(X ) =
x
x2 p(x).
x
IET
This is also sometimes called the mean of the random variable X and denoted
• The quantity E(X − µ)2 = E(X 2 ) − (µ)2 is called the variance of the
random variable X and is denoted Var(X) or V (X) or σ 2 .
p
• The square root of the variance, σ ≡ V ar(X) is called the standard devi-
ation.
AJ
Mean and Variance of continuous random variable :
• If X is a continuous random variable with pdf f (x), then the expected value
of X is defined by
Z ∞
E(X) = x f (x) dx
−∞
2
Z value of X is
The expected
∞
2
E(X ) = x2 f (x) dx,
−∞
• The quantity E(X − µ)2 = E(X 2 ) − (µ)2 is called the variance of the
random variable X and is denoted Var(X) or V (X) or σ 2 .
p
• The square root of the variance, σ ≡ V ar(X) is called the standard devi-
ation.
IET
Solution : Here, p(x) is a valid pdf if (a) P (x) ≥ 0 and
P
(b) P (x) = 1
Hence we must have k ≥ 0 and
k + 3k + 5k + 7k + 9k + 11k + 13k = 1
1
i.e, 49K = 1 ⇒ K =
49
16
p(x < 4) = p(X = 0) + p(X = 1) + p(X = 2) + p(X = 3) = 16k =
49
24
P (x ≥ 5) = P (5) + P (6) = 11K + 13K = 24K =
49
33
P (3 < x ≤ 6) = P (4) + P (5) + P (6) = 33K =
49
Problem 2. A discrete random variable X has the p.m.f.:
x 1 2 3 4 5 6 7
f (x) k 2k 2k 3k k2 2k2 7k2 + k
(a) Find k (b) Evaluate P (X < 3), P (X ⩾ 6)
AJ
Solution: We know that f (x) must satisfy two condition to qualify as the p.m.f.:
(i) f (x) ⩾ 0∀x
(ii) ∞
P
−∞ f (x) = 1
Now, we use the condition (ii) to evaluate k :
P∞
−∞ f (x) = f (1) + f (2) + f (3) + f (4) + f (5) + f (6) + f (7) = 1
⇒ k + 2k + 2k + 3k + k2 + 2k2 + 7k2 + k = 1
⇒ 10k2 + 9k = 1
⇒ 10k2 + 10k − k − 1 = 0
IET
⇒ 10k(k + 1) − 1(k + 1) = 0
⇒ (10k − 1)(k + 1) = 0
1
⇒ k = −1, 10
Now if we take k = −1, then condition (i) is not satisfied.
Taking k = 1/10 condition (i) becomes satisfied for all x. Next,
P (X < 3) = P {(X = 1) + (X = 2)} = P (X = 1) + P (X = 2)
= f (1) + f (2)
1 1
= +2 Remember that f (1) = k, f (2) = 2k
10 10
3
=
10
P (X ≥ 6) = P (X = 6) + P (X = 7) = f (6) + f (7)
( )
1 2 1 2
1
=2 + 7 +
100 10 10
1 1 1
AJ
=2 + 7 +
100 100 10
2 7 10 19
= + + =
100 100 100 100
Problem 3.
A random variable x has the following density function
kx2 , −3 ≤ x ≤ 3
P (x) =
0 elsewhere
Evaluate k and find (i) P (1 ≤ x ≤ 2) (ii) P (x ≤ 2) (iii) P (x > 1)
Z ∞ Z 3
P (x)dx = 1 ⇒ kx2 dx = 1
−∞ −3
3 3
kx 1
⇒ =1⇒k=
3 −3 18
Z 2 2
x
P (1 ≤ x ≤ 2) = dx
1 18
IET
2
1 x3 1 7
= = (8 − 1) =
18 3 1 54 54
Z 2 1
P (x ≤ 2) = x2 dx
−318
3 2
1 x 1 35
= = (8 + 27) =
18 3 −3 54 54
Z 3
1 2
P (x > 1) = x dx
1 18
3
1 x3 1 26 13
= = (27 − 1) = =
18 3 1 54 54 27
Problem 4. The probability density function of a variate x is,
x −2 −1 0 1 2 3
AJ
P (X) 0.1 k 0.2 2k 0.3 k
X
(µ) = xi P (xi )
= (−2)(0.1) + (−1)k + (0)(0.2) + (1)(2k) + 2(0.3) + 3k
= −0.2 − 0.1 + 0.2 + 0.6 + 0.3 = 0.8
Variance is given by
IET
σ 2 = E[X 2 ] − µ2
X
= x2i P (xi ) − µ2
= (−2)2 (0.1) + (−1)2 k + (0)2 (0.2) + (1)2 (2k)
+ 22 (0.3) + 32 k − (0.8)2
= (0.4 + 0.1 + 0.2 + 1.2 + 0.9) − (0.8)2 = 2.16
p √
Thus Mean = 0.8, Variance = 2.16 and S.D= V ar(X) = 2.16
Problem 5. (i) Is the function defined by f (x) = e−x , x > 0, f (x) = 0, x <
0 is a density function?
(ii) If so, determine the probability that the variate having this density will fall in
the interval (1, 2)
= −(0 − 1) = 1
i.e. condition (b) is satisfied.
Hence f (x) is a probability density function.
(ii) The probability that the variate having this density will fall in the interval (1,2) is
given by Z 2
P (1 < x < 2) = f (x)dx
1
2
= − e−x 1
= − e−2 − e−1
IET
= 0.2325
Thus P (1 < x < 2) = 0.2325
kx2 0≤x≤3
Problem 6. Find the constant k such that the function f (x) =
0 elsewhere
is a p.d.f. Also find (i) P (1 < X < 2) (ii) P (X ≤ 1) (iii) P (X > 1) (iv) Mean
(v) Variance [VTU July 2023, Jan 2018]
Z 2
(i) P (1 < x < 2) = f (x)dx
1
2 x2
Z
= dx
1 9
3 2
7 x
= =
Z 1 27 1 27
(ii) P (x ≤ 1) = f (x)dx
(iii)
P (x > 1) =
(iv)
=
=
Z 3
1
=
1 9
0
Z 1 2
0 9
x
x
3 1
27 0
f (x)dx
Z 3 2
x
3 3
x
27 1
dx
=
26
27
dx
=
1
27
IET
Z ∞
µ= x · f (x)dx
−∞
x2
Z 3
x·
AJ
Mean, = dx
0 9
4 3
x 81 9
= = =
36 0 36 4
(v) Variance
V = E[X 2 ] − µ2
Z ∞
= x2 f (x)dx − (µ)2
−∞
Z 3 2 2
x 9
= x2 · dx −
0 9 4
5 3
x 81
= −
45 0 16
81 81 81 27
= − = =
15 16 240 80
Problem 7. The pdf of the random variable X is given by the following table.
x -3 -2 -1 0 1 2 3
P(x) K 2k 3k 4k 3k 2k k
Find (i) k (ii) P (X ≤ 1) (iii) P (X > 1) (iv) P (−1 < X ≤ 2) (v) Mean of X
(vi) SD of X [VTU: Dec/ Jan 16, July 2013]
Solution : Since p(x) is a pdf , we must have (a) p(x) ≥ 0 for all x and (b)
P
p(x) = 1.
IET
(i) From condition (a) we have k ≥ 0
and the second Using condition (b), we have
(ii)
k + 2k + 3k + 4k + 3k + 2k + k = 1 or 16k = 1 ∴ k =
(iii)
(iv)
= 13k =
13
16
9
p(−1 < x ≤ 2) = p(0) + p(1) + p(2) = 9k =
16
AJ
(v)
X
Mean = µ = x · p(x)
= −3(k) − 2(2k) − 1(3k) + 0(4k) + 1(3k) + 2(2k) + 3(k)
=0
(vi)
X
Variance = V = (x − µ)2 · p(x) = E(X 2 ) − µ2
X
= x2 · p(x) − 0
= (−3)2 (k) + (−2)2 (2k) + (−1)2 (3k) + 02 (4k)
+ 12 (3k) + 22 (2k) + 32 (k)
1 5
= k(9 + 8 + 3 + 0 + 3 + 8 + 9) = (40) =
16 2
q
1
Thus k = 16
, Mean =0 and S . D .= 52
IET
Solution : Given pdf is p(x) = y0 e−|x| , −∞ < x < ∞
Here, X is a continuous random variable.
Hence
Z ∞ we must have,
p(x)dx = 1
−∞
Z ∞
⇒ y0 e−|x| dx = 1
−∞
Z ∞
⇒2 y0 e−|x| dx = 1 (∵ e−|x| is an even function)
Z0 ∞
⇒2 y0 e−x dx = 1 (∵ |x| = x, for x > 0)
0 ∞
e−x
⇒ 2y0 =1
(−1) 0
⇒ 2y0 [0 − (−1)] = 1
1
⇒ y0 =
2
AJ
MeanZis,
∞
µ= xp(x)dx
−∞
Z ∞
= y0 xe−|x| dx
−∞
IET
Z ∞
= y0 2 x2 e−x dx
0 −x ∞
2 e e−x e−x
= y0 2 (x ) − (2x) + (2)
(−1) (−1)2 (−1)3 0
= 2y0 [(0 − 0 + 0) − (0 − 0 + −2)]
=2
1
Solution : Given pdf is p(x) = 2−x = 2x
Clearly (a) p(x) > 0 forall x and
∞
X X 1 1 1 1
(b) p (xi ) = x
= + 2
+ 3
+ ···
i x=1
2 2 2 2
AJ
1 1 1
= 1 + + 2 + ···
2 2 2
!
1 1
= =1
2 1 − 21
1
(∵ geometric series 1 + r + r 2 + · · · = )
1−r
IET
2 2 22
" #
1 1
= 2
2 1 − 212
1 4 1
= =
4 3 3
1
Thus p(Xis even ) = 3
(ii)
p(Xis divisible by 3) =p(X = 3) + p(X = 6) + p(X = 9) + · · ·
1 1 1
= 3 + 6 + 9 + ···
2 " 2 2 #
2
1 1 1
= 3 1+ 3 + + ···
2 2 23
" #
1 1
= 3
2 1 − 213
AJ
1 8 1
= 3× =
2 7 7
1
Thus p(Xis divisible by 3) = 7
(iii)
p(X ≥ 5) = 1 − p(X < 5)
4
X
=1− p (xi )
i=1
4
IET
X 1
=1−
x=1
2x
1 1 1 1
=1− + 3+ 4 +
22 2 2 2
1 1 1 1
=1− + + +
2 4 8 16
15 1
=1− =
16 16
1
Thus p(X ≥ 5) = 16
Problem 10. X isa continuous random variable with probability density function
kx , (0 ≤ x < 2)
given by f (x) = 2k , (2 ≤ x < 4) Find k and mean value of X
−kx + 6k ,
(4 ≤ x < 6)
1
⇒ 2k + 4k + (−10k + 12k) = 1 ⇒ k =
8
MeanZof X is
∞
µ= x f (x)dx
−∞
Z 6
= xf (x)dx
0
Z 2 Z 4 Z 6
2
= kx dx + 2kxdx +
x(−kx + 6k)dx
IET
0 2 4
3 2 2 4 3 6 2 6 !
x x x x
=k + 2k + −k + 6k
3 0 2 2 3 4 2 4
8 152 1
=k + k(12) − k + 3k(20) = (24) = 3
3 3 8
The probability that X falls in the interval (a, b) is given by the difference in the
CDF values at b and a: Z b
P (a < X < b) = F (b) − F (a) = f (x) dx
a
This is essentially the area under the PDF curve between a and b.
Problem 11. Suppose that the error in the reaction temperature, in ◦ C, for a con-
IET
trolled laboratory experiment is a continuous random variable X having the proba-
bility density function
x2 , −1 < x < 2
3
f (x) =
0, elsewhere
(a) Verify that f (x) is a density function.
(b) Find P (0 < X ≤ 1).
(c) Find the Cumulative density function.
Solution :(a) To verify f (x) is a density function, we verify the following two con-
ditions.
(i) non-negativity i.e., f (x) ≥ 0, for all x.
R∞
(ii) Normality : i.e. −∞ f (x)dx = 1
Obviously, f (x) ≥ 0. and
Z ∞ Z 2 x2
AJ
f (x)dx = dx
−∞ −1 3
3 2
x
=
9 −1
8 1
= + =1
9 9
Hence f (x) is a density function.
Z 1 2
x
P (0 < X ≤ 1) = dx
0 3
1
(b) x3
=
9 0
1
= .
9
IET
x3 + 1
= .
9
Therefore,
0, x < −1
3
F (x) = x +1 , −1 ≤ x < 2
9
1, x≥2
Problem 12. A random variable X takes the values −3, −2, −1, 0, 1, 2, 3 such
that P (X = 0) = P (X < 0) and P (X = −3)=P (X = −2)=P (X = −1)
= P (X = 1) = P (X = 2) = P (X = 3). Find the probability distribution.
[VTU July 2017]
p1 + p2 + p3 + p4 + p5 + p6 + p7 = 1
k + k + k + 3k + k + k + k = 1
9k = 1
IET
1
k=
9
Substituting,
1
p1 = p2 = p3 = p5 = p6 = p7 =
9
and
3
p4 = 3k =
9
Thus the probability distribution is as follows.
X -3 -2 -1 0 1 2 3
1 1 1 3 1 1 1
P(x) 9 9 9 9 9 9 9
Problem 13. If the random variable X takes the values 1, 2, 3 and 4 such that
2P (X = 1) = 3P (X = 2) = P (X = 3) = 5P (X = 4). find the
AJ
probability distribution function and cumulative distribution function of X.
IET
61
25
If 2 ≤ x < 3, then F (x) = P (X = 1) + P (X = 2) = ,
61
If 3 ≤ x < 4, then
55
F (x) = P (X = 1) + P (X = 2) + P (X = 3) = ,
61
Ifx ≥ 4, then
F (x) = P (X = 1) + P (X = 2) + P (X = 3) + P (X = 4) = 1
Problem 14. A coin is tossed twice. A random variable X represent the number of
heads turning up. Find the discrete probability distribution for X. Also find its mean
and variance.
Solution : When a fair coin is tossed twice, the possible outcomes for each toss are
either heads (H) or tails (T). The random variable X represents the number of heads
AJ
turning up in the two tosses.
Here, Sample Space, S = {HH, HT, T H, T T }.
X= number of heads
The association of the elements of S to the random variable X are respectively
2, 1, 1, 0
Now, P (HH) = 41 , P (HT ) = 41 , P (T H) = 14 , P (T T ) = 1
4
1
When X = 0, both tosses result in tails (TT). Hence P (X = 0) = P (T T ) = 4
When X = 1, One toss results in heads and the other in tails (HT or TH).
1 1 1
P (X = 1) = P (HT ∪ T H) = 4
+ 4
= 2
1
P X = 2) = P (HH) = 4
The discrete probability distribution for X is as follows.
X = xi 0 1 2
1 1 1
p (xi ) 4 2 4
P
Clearly, p (xi ) > 0 and p (xi ) = 1
xi p (xi ) = (0) 41 + (1) 12 + (2) 14 = 1
P
Mean = µ =
IET
1 1 1 3
E(X 2 ) = 02 · + 12 · + 22 · =
4 2 4 2
Variance is
3 1
Var(X) = E(X 2 ) − [E(X)]2 = −1=
2 2
1
Thus we have, Mean = 1 and Variance = 2
kxe−x 0≤x≤1
Problem 15. Find the constant k such that the function f (x) =
0 elsewhere
is a p.d.f. Find the mean. (VTU Model 2022)
1
⇒ k (−xe−x ) − e−x 0 = 1
⇒ k (−2)e−1 + 1 = 1
1 1 1 e
⇒k= = 2 = e−2 =
1 − 2e−1 1− e e
e−2
µ=
=kIET
MeanZis given by,
−∞
Z 1
= k x2
∞
Z 1
0
x · f (x)dx
x · kxe−x dx
x2 e−x dx
e−x
(−1)
= k (−5)e−1 + 2
− 2x
e−x
(−1)2
+2
e−x 1
(−1)3 0
= k (−1)e−1 − 2e−1 − 2e−1 − 0 − 0 + 2
−5
=k +2
e
2e − 5
=k
AJ
e
2e − 5
e
=
e−2 e
2e − 5
=
e−2
Problem 16. Let X be the random variable that denotes the life in hours of a certain
electronic device. The probability density
function is
20,000 , x > 100,
x3
f (x) =
0, elsewhere.
Find the expected life of this type of device.
IET
=
x 100
−20, 000
= 0− = 200.
100
Therefore, we can expect this type of device to last, on average, 200 hours.
Problem 17. Let the random variable X represent the number of defective parts for
a machine when 3 parts are sampled from a production line and tested. The following
is the probability distribution of X.
x 0 1 2 3
f (x) 0.51 0.38 0.10 0.01
calculate σ 2 .
= 0.87 − (0.61)2
= 0.4979.
Problem 18. The weekly demand for a drinking-water product, in thousands of liters,
from a local chain of efficiency stores is a continuous random variable X having the
probability density
2(x − 1), 1<x<2
f (x) =
0, elsewhere
Find the mean and variance of X.
IET
Solution : Here, X is a continuous random variable.
Calculating E(X) and E X 2 , we have
Z
µ = E(X) =
=2
=2
=2
=2
−∞
Z 2
2
3
3
∞
1
x3
2
x f (x)dx
x(x − 1)dx
(x2 − x)dx
−
2
22
x2
2
−
1
13
+
12
3 2 3 2
8 1 1
=2 −2− +
3 3 2
AJ
5
=
3
and Z ∞
E X2 = x2 f (x)dx
−∞
Z 2
=2 (x3 − x2 )dx
1
2
x4 x3
=2 −
4 3 1
4
23 14 13
2
=2 − − +
4 3 4 3
8 1 1
=2 4− − +
3 4 3
17
= .
6
Dr. Shantha Kumari K AJIET, Mangalore
Lecture Notes - BCS301 : Mathematics for Computer Science - Module 1 : Probability Distributions Page 32
Therefore,
σ 2 = E X 2 − µ2
2
17 5
= −
6 3
1
= .
18
IET
Problem 19. A variate X has the probability distribution
x : −3 6 9
P (X = x) : 1/6 1/2 1/3
Find E(X) and E X . Hence evaluate E(2X + 1)2 .
2
= 4(93/2) + 4(11/2) + 1
= 209.
Bernoulli trials
IET
p+q =1
Example : In tossing a fair die, if you define success as getting the outcome 6
then getting other outcomes will be a Failure.
1 5
Here p = P ({6}) = 6
and q = P ({1, 2, 3, 4, 5}) = 6
Clearly, p + q = 1
Suppose that we repeat Bernoulli trials n times independently under the same condi-
tions. An experiment involving such independent Bernoulli trials is called a binomial
experiment.
Let X= no. of successes in n trials.
Clearly X can take the values X = 0, 1, 2, · · · , n
AJ
The probability of x successes out of n Bernoulli trials is given by,
P (x) = n Cx px q n−x with p + q = 1
This probability distribution can be given as
x 0 1 2 ··· n
p(x) q n n
C1 p1 q n−1 n
C2 p2 q n−2 · · · pn
where
n = no. of trials
x = no. of successes in n trials
p = Probability of Success in one trial
q = 1 − p is the Probability of Failure in one trial
IET
When X is a binomial random variable we also write it as X ∼ B(n, p)
=
n
X
x=0
Xn
x
x=0
n!
(n − x)!x!
n!
px q n−x
= px q n−x
x=1
(x − 1)!(n − x)!
(Since the x = 0 term vanishes)
n
X n(n − 1)!
(p)px−1 q n−x
AJ
µ=
x=1
(x − 1)!(n − x)!
n
X (n − 1)!
= np px−1 q n−x
x=1
(x − 1)!(n − x)!
n
X (n − 1)!
= np px−1 q n−x
x=1
(x − 1)![(n − 1) − (x − 1)]!
X n
n−1
= np Cx−1 px−1 q n−x
x=1
n−1
C0 q n−1 + n−1 C1 pq n−2 + n−1 C2 p2 q n−3 + . . . . + n−1 Cn−1 pn−1
= np
= np (p + q)n−1
= np[p + q]n−1
IET
Xn n
X
n x n−x
= [x(x − 1)] Cx p q + x n Cx px q n−x
x=0 x=0
n
X x(x − 1)n!
= px q n−x + np
(n − x)!x(x − 1)(x − 2)!
x=2
n
2
X (n − 2)!
= n(n − 1)p px−2 q n−x + np
x=2
(x − 2)!(n − x)!
n
2
X (n − 2)!
= n(n − 1)p px−2 q n−x + np
x=2
(x − 2)![(n − 2) − (x − 2)]!
Xn
2 n−2
= n(n − 1)p Cx−2 px−2 q n−x + np
x=2
2 n−2
C0 q n−2 + n−2 C1 pq n−3 + . . . + n−2 Cn−2 pn−2 + np
= n(n − 1)p
= n(n − 1)p2 (p + q)n−2 + np
Since p + q = 1, we have
AJ
E X2 = n(n − 1)p2 + np
Using this,
Var(X) = E X2 − [E(X)]2
is 2.
Mean, µ = np
V (X) or σ 2 = npq is the Variance,
√
IET
S.D. or σ = npq is the Standard Deviation
Problem 20. Find n and p in the binomial distribution whose mean is 3 and variance
npq 2
=
np 3
2
⇒q=
AJ
3
2 1
∴p=1−q =1− =
3 3
1
np = 3 ⇒ n
=3⇒n=9
3
Problem 21. A die is tossed thrice. A success is ‘getting 1 or 6’ on a toss. Find the
mean and variance of the number of successes. [VTU Dec 2011]
IET
n = 12 (no. of pens)
Probability of a defective pen is p = 1/10 = 0.1
Probability of a non-defective pen = q = 1 − p = 1 − 0.1 = 0.9
We have P (x) = nCx px q n−x where we have n = 12
(i) Prob. (exactly two defectives) is
P (x = 2) = 12C2 (0.1)2 (0.9)10 = 0.2301
(ii) Prob. (atleast 2 defectives) is,
P (X ≥ 2) = 1 − P (X < 2)
= 1 − [P (x = 0) + P (x = 1)]
= 1 − 12 C0 (0.1)0 (0.9)12 +12 C1 (0.1)1 (0.9)11
= 0.341
(iii) Prob. (no defective) is ,
12
P (x = 0) = C0 (0.1)0 (0.9)12 = (0.9)12 = 0.2824
AJ
Problem 23. The probability that a bomb dropped hits the target is 0.2, find the
probability that out of 6 bombs dropped (i) Exactly two will hit the target (ii) Atleast
2 will hit the target [VTU:June/ July 15]
IET
P (X ≥ 2) = 1 − P (X < 2)
= 1 − [p(0) + p(1)]
= 1 − nC0 p0 q n−0 + nC1 p1 q n−1
"
= 1 − 6 C0
0 6
= 1 − 0.65536
= 0.3446
1
5
4
5
+ 6 C1
1 5 #
1
5
4
5
Problem 24. In sampling a large no. of parts manufactured by a machine, the mean
number of defectives in a sample of 20 is 2. Out of 1000 such samples, how many
would be expected to contain atleast 3 defective parts.? [VTU June 2019, 2004]
Solution :
AJ
In one sample :
let X= no. of defective parts.
n= no. of items = 20
Given that Mean, (µ) = np = 2
2 2
⇒p= = = 0.1
n 20
Hence q = 1 − p = 0.9
The pdf is P (x) = nCx px q n−x = 20Cx (0.1)x (0.9)20−x Probability of atleast 3
defective parts
P (X ≥ 3) = P (3) + P (4) + · · · + P (20)
OR
P (X ≥ 3) = 1 − P (X < 3)
IET
= 1 − [P (0) + P (1) + P (2)]
= 1 − (0.9)20 + 20C1 (0.1)(0.9)19 + 20C2 (0.1)2 (0.9)18
= 0.323
Thus in 1000 samples the number of defectives is 1000 × 0.323 = 323
Problem 25. Out of 800 families of 5 children each, how many would you expect to
have (a) 3 Boys (b) 5 Girls (c) either 2 or 3 boys (d) atmost 2 girls? Assume equal
probabilities for Boys and Girls. [ VTU August 2022, August 2021, 2004]
i.e. Expected number of families with 5 girls is 25. (iii)No. of Families with 2 or 3
boys = No. of Families with 2 boys + No. of Families with 3 boys and is given by
5C 5C
800P (X = 2) + 800P (X = 3) = 800 × 2 + 800 × 3
32 32
= 500
Expected number of faimilies with 2 or 3 boys is 500.
IET
(iv) No. of Families with atmost 2 girls means that, families can have 5 boys and 0
girls or 4 boys and 1 girl or 3 boys and 2 girls.
Hence required answer is
800P (X = 5) + 800P (X = 4) + 800P (X = 3)
5C 5C 5C
= 800 × 5 + 800 × 4 + 800 × 3
32 32 32
= 400
i.e. Expected number of families with atmost 2 girls is 400.
IET
= 0.3828
(iii) P(at most 8 seeds germinate) is
P (X ≤ 8) = 1 − P (X > 8)
= 1 − [P (X = 9) + P (X = 10)]
= 1 − [0.1211 + 0.0282]
= 0.8507
∞
X
Mean E(X) = xp(x)
x=0
∞
X e−λ λx
= x
x=0
x!
(∞ )
x−1
X λ
= λe−λ
(x − 1)!
IET
x=1
λ2 λ3
−λ λ
= λe 1+ + + + ...
1! 2! 3!
= λe−λ eλ = λ
∞
X
2
x2 p(x)
E X =
x=0
∞
X e−λ λx
= x
x=0
x!
∞
X e−λ λx
= [x(x − 1) + x]
x=0
x!
∞ ∞
X e−λ λx X e−λ λx
= x(x − 1) + x
x=0
x! x=0
x!
∞
X e−λ λx
= + E(X)
x=2
(x − 2)!
∞
AJ
−λ 2
X λx−2
=e λ +λ
x=2
(x − 2)!
2
λ λ
= e−λ λ2 1 + + + . . . .. + λ
1! 2!
= e−λ λ2 eλ + λ = λ2 + λ
= λ2 + λ − λ2 = λ
√
S.D. = λ
The Mean and Variance of Poisson Distribution are given by
Mean, µ = λ, Variance, V (X)or σ 2 = λ
e−λ λx
Solution: We know P (X = x) = x!
Given λ = 2
e−2 20
P (X = 0) =
0!
= e−2 = 0.1353
IET
Problem 28. In a poisson distribution if P (x = 3) = P (x = 2) find P (x = 0)
e−λ λx
Solution: Given: P (x = 3) = P (x = 2) We know P (X = x) = x!
P (x = 3) = P (x = 2) (given)
e−λ λ3 e−λ λ2
⇒ =
3! 2!
λ 1
⇒ =
6 2
⇒λ=3
e−3 30
∴ P (x = 0) =
0!
= e−3 = 0.0497
2
Problem 29. If X is a poisson variable with P (X = 2) = 3
P (X = 1), find
P (X = 3) and P (X = 0)
AJ
e−λ λx
Solution:We know P (X = x) = x!
Given that
2
P (X = 2) = P (X = 1)
3
−λ 2
e λ 2 e−λ λ1
=
2! 3 1!
λ 2
=
2 3
4
λ=
3
4 4
3
e− 3 3
∴ P (X = 3) =
3!
= 0.1041
4
e− 3 ( 43 )4
P (X = 0) =
0!
− 13
=e
IET
= 0.2635
Problem 30. Given that 2% of fuses manufactured by a firm are defective, find by
using Poisson distribution, the probability that a box containing 200 fuses has (i) no
defective fuses (ii) 3 or more defective fuses (iii) At least one defective fuse [VTU
August 2022, Dec 2018, Dec 2010]
2
Solution: Here p = probability of a defective fuse = 2% = 100
= 0.02
n= no. of fuses = 200
mean number of defectives λ = np = 200 × 0.02 = 4 The Poisson distribution
is given by
λx e−λ 4x e−4
P (x) = =
x! x!
(i) Probability of no defective fuse is
40 e−4
= P (0) = = 0.0183
AJ
0!
(ii) Probability of 3 or more defective fuses,
P (X ≥ 3) = 1 − P (X < 3)
= 1 − [P (0) + P (1) + P (2)]
0 −4
41 e−4 42 e−4
4 e
=1− + +
0! 1! 2!
1 2
4 4
= 1 − e−4 1 + +
1! 2!
= 1 − 0.0183(1 + 4 + 8) = 0.76189
IET
of 1/500 for any blade to be defective. The blades are supplied in packets of 10. Use
poisson distribution to calculate the approximate number of packets containing (i)no
defective (ii) One defective (ii) Two defectives
in a consignment of 10000 packets
[VTU: Model 2023, Model 2020, Dec/ Jan 16, 2004]
λ = np = 10 × 0.002 = 0.02
λx e−λ e−0.02 (0.02)x
Poisson distribution is P (x) = x!
= x!
AJ
In a consignment of 10000 packets :
(i) no. of packets with no defective is
e−0.02 (0.02)0
10000 × p(X = 0) = 10000 ×
0!
= 9802
(i) no. of packets with one defective is
e−0.02 (0.02)1
10000 × p(X = 1) = 10000 ×
1!
= 196
(iii) no. of packets with two defectives is
e−0.02 (0.02)2
10000 × p(X = 2) = 10000 ×
2!
≈2
Problem 32. A car hire -firm has two cars which it hires out on a day to day basis.
The number of demands for a car is known to be Poisson distribution with mean 1.5
Find the probability of day on which (i) There is no demand for the car and (ii) The
demand is rejected.
IET
n= no. of cars=2
Given that, λ = 1.5
(1.5)x e−1.5
∴ P (x) =
x!
(i) The probability of day on which there is no demand for the car is
(1.5)0 e−1.5
P (0) = = e−1.5 = 0.2231
0!
(ii)The demand will be rejected if no. of demands is more than no. of available cars.
The probability that the demand is rejected is,
P (X > 2) = 1 − P (X ≤ 2)
= 1 − [P (0) + P (1) + P (2)]
(1.5)0 e−1.5 (1.5)1 e−1.5 (1.5)2 e−1.5
=1− + +
0! 1! 2!
−1.5
+ 1.5e−1.5 + 1.125e−1.5 = 0.1912
=1− e
Problem 33. The number of accidents in a year to taxi drivers in a city follows a
AJ
Poisson distribution with mean 3. Out of 1000 taxi drivers find approximately the
number of the drivers with“ (i) no accident in a year (ii) more than 3 accidents in a
year. (VTU July 2023)
e−3 30
1000 × p(0) = 1000 × = 49.78 ≈ 50
0!
(ii) Number of drivers with more than 3 accidents in a year
1000 × p(X > 3) = 1000 × {1 − P (x ≤ 3)}
= 1000 {1 − [P (0) + P (1) + P (2) + P (3)]}
0
31 32 33
3
IET
−3
= 1000 1 − e + + +
0! 1! 2! 3!
= 1000 {1 − [0.0498(1 + 3 + 4.5 + 4.5)]} ≈ 350
Problem 34. The probability that a man aged 60 will live up to 70 is 0.65. What is
the probability that out of 10 men, now 60, at least 7 will live to be 70? (VTU July
2023)
p(x) = n Cx px q n−x = 10
Cx (0.65)x (0.35)10−x
and X = 0, 1, 2, . . . 10
Probability that at least 7 men will live to 70,
P (X ≥ 7) = p(7) + p(8) + p(9) + p(10)
10
= C7 (0.65)7 (0.35)10−7
+ 10 C8 (0.65)8 (0.35)10−8
IET
+ 10 C9 (0.65)9 (0.35)10−9
+ 10 C10 (0.65)10 (0.35)10−10
= 0.51383
Problem 35. If the probability of a bad reaction from certain injection is 0.001,
determine the chance that out of 2000 individuals more than two will get a bad
reaction. [VTU July 2023, Jan 2018, 2008]
Problem 36. Suppose 300 misprints are randomly distributed throughout a book of
500 pages, find the probability that a given page contains (i) exactly three misprints
(ii) less than three misprints and (iii) four or more misprints. [VTU Model 2020]
Solution: we assume that X= the number of misprints on one page is the number of
successes.
n = Total no. of misprints = 300 and p = 1/500
( assuming that misprints are evenly distributed in 500 pages)
Since n is large and p is small, we shall use Poisson Distribution.
300
Average no. of misprints is, λ = np = = 0.6
IET
500
x −λ x −(0.6)
λ e (0.6) e
Hence the pdf is p(x) = x!
= x!
(i)P(exactly three misprints) is
(0.6)3 e−(0.6)
p(X = 3) = = 0.0198
3!
(ii) p(less than three misprints) is
p(X < 3) = p(0) + p(1) + p(2)
(0.6)0 e−(0.6) (0.6)1 e−(0.6) (0.6)2 e−(0.6)
= + +
0! 1! 2!
2
(0.6)
= e−(0.6) 1 + (0.6) +
2!
(0.6)2
= 0.5488 1 + (0.6) +
2!
= 0.9768
and (iii) p(four or more misprints) is,
AJ
p(X ≥ 4) = 1 − p(X < 4)
= 1 − [p(0) + p(1) + p(2) + p(3)]
"
(0.6)0 e−(0.6) (0.6)1 e−(0.6)
=1− +
0! 1!
#
2 −(0.6) 3 −(0.6)
(0.6) e (0.6) e
+ +
2! 3!
= 1 − 0.9966
= 0.0034
Problem 37. A die is thrown 8 times. Find the probability that 3 falls (i) Exactly two
times (ii) At least once (iii) At the most 7 times [VTU July 2013]
IET
6 6
0.2605
(ii) The probability that 3 falls at least once is
1 0 5 (8−0)
= P (X ≥ 1) = 1 − P (0) = 1 − 8 C0 6 6
= 0.7674
(iii) The probability that 3 falls at the most 7 times is
8 5 (8−8)
= P (X ≤ 7) = 1 − P (8) = 1 − 8 C8 16 6
= 0.9999 ≈ 1
Problem 38. If the mean and standard deviation of the number of correctly answered
√
questions in a test given to 4096 students are 2.5 and 1.875. Find an estimate of
the number of candidates answering correctly (i) 8 or more questions (ii) 2 or less
(iii) 5 questions.
IET
= 1.703 ≈ 2
(ii) number of candidates answering 2 or less questions correctly is,
4096 p(X ≤ 2) = 4096 [(p(0) + p(1) + p(2)]
= 4096 10C0 (0.25)0 (0.75)10−0 + 10C1 (0.25)1 (0.75)10−1
= 2152.8 ≈ 2153
i.e. No. of students correctly answering 2 or less than 2 questions is 2153.
(iii) number of candidates answering 5 questions correctly is,
4096 p(5) = 4096 10C5 (0.25)5 (0.75)10−5
= 239.2 ≈ 239
i.e. Number of students correctly answering 5 questions is 239.
IET
(
1 −x/3
3
e ,0 <x<∞
Hence f (x) =
0, otherwise
(i)
P (x > 1) = 1 − P (x ≤ 1)
Z 1
=1− f (x)dx
0
Z 1
1 −x
=1− e 3 dx
3
h 0 x i1 1
= 1 + e− 3 = e− 3 = 0.7165
0
(ii) Z 3
P (x < 3) = f (x)dx
0
Z 3 1 x
= e− 3 dx
0 3
h x i3 1
= − e− 3 = 1 − = 0.6321
AJ
0 e
Problem 1.13.2. In a certain town, duration of a shower is exponentially distributed
with mean 5 minutes. What is the probability that the shower will last for (i) 10
minutes or more (ii) less than 10 minutes (iii) Between 10 and 12 minutes [VTU
June 2019, Jan 2014, July 2013]
Solution: The p.d.f of the exponential distribution is given by f (x) = αe−αx , x >
1
0 and the mean is = α
1 1
By data, α
=5∴α= 5
x
Hence f (x) = 15 e− 5
(i) Z ∞ 1 x
P (x ≥ 10) = e− 5 dx
10 5
h x i∞
= − e− 5
10
(ii)
IET
Z 10 1 x
P (x < 10) = e− 5 dx
0 5
h x i10
= − e− 5
0
= − e−2 − 1 = 1 − e−2
= 0.8647
(iii) Z 12 1 x
P (10 < x < 12) = e− 5 dx
10 5
h x i12
= − e− 5
12 10
−5 −2
=− e −e
= 0.0446
1
Solution: We have f (x) = αe−αx , x > 0; Mean = α
1 1
By data, α
=5 ⇒ α= 5
x
Hence f (x) = 15 e− 5 is the p.d.f.
(i) Z 5 1 −x
Z 5
P (x < 5) = f (x)dx = e 5 dx
0 0 5
h x i5 1
= − e− 5 = 1 − = 0.6321
0 e
Z 10
P (5 < x < 10) = f (x)dx
5
Z 10 1 x
= e− 5 dx
5 5
h i10
−x/5
=− e
5
1 1
= − = 0.2325
e e2
f (x) =
1
α
(
IET
Problem 1.13.4. The length of a telephone conversation has been exponentially dis-
tributed with mean of 2 minutes. Find the probability that a call (i) ends in more than
3 minutes (ii) ends in less than 4 minutes and (iii) takes between 3 and 5 minutes
= 2, ⇒ α =
1 − x2
2
e , for x ≥ 0
0, for x < 0
1
2
Problem 1.13.5. The daily turn over in a medical shop is exponentially distributed
with Rs.6000 as the average with a net profit of 8%. Find the probability that the
net profit exceeds Rs. 500 on a randomly chosen day.
Solution: Let x denote the random variable denoting the turn over per day.
1 1
Given = 6000 ⇒ α =
IET
α 6000
Let A be the turn over in rupees for which the net profit is Rs. 500. Given that net
profit is 8% of the turn over.
Hence we have
8
× A = 500 ⇒ A = 6250
100
i.e To get net profit as Rs. 500 the required turn over 6250 rupees.
Since the profit exceeds Rs. 500 the turn over has to exceed Rs.6250.
Hence the probability that the net profit exceeds Rs. 500 is given by
P (x > 6250) = 1 − P (x ≤ 6250)
Z 6250
=1− p(x)dx
0
Z 6250
1 − 1 x
=1− e 6000 dx
0 6000
= 0.353
AJ
Problem 1.13.6. The sales per day in a shop is exponentially distributed with the
average sale amounting. to Rs. 100 and net profit is 8%. Find the probability that
the net profit exceeds Rs. 30 on two consecutive days.
Solution : Let x be the random variable of the sale in the shop. since x is an
exponential variate the p.d.f f (x) = αe−αx , x > 0
Mean = 1/α = 100 ∴ α = 1/100 = 0.01
Hence f (x) = 0.01e−0.01x , x > 0
Let A be the amount of sale for which profit is Rs 30. Given that profit rate is 8%.
8
⇒A× = 30 ∴ A = 375
100
IET
Probability of profit exceeding Rs.30 on a single day is e−3.75
For two consecutive days, Probability of profit exceeding Rs.30 is, e−3.75 ×e−3.75 =
0.00055
Mean = µ
Variance = σ 2
1 (x−µ)2
P (x) = √ e− 2σ2 , − ∞ < x < ∞
σ 2π
Standard Deviation = σ
AJ
Normal curve
The normal curve is a bell-shaped curve symmetric about the line x = µ as shown
in the following figure.
The line x = µ divides the total area under the curve which is equal to 1 into two
equal parts. The area to the right as well as to the left of the line x = µ is 0.5
IET
For standardizing a Normal Random Variable, we use
z=
x−µ
σ
The two tails of the standard normal probability distribution extend indefinitely and
never touch the horizontal axis. The total area under this curve from z = −∞ to
∞ is 1 unit.
By symmetry,
the area under the standard normal curve (from z = −∞ to z = 0)= 0.5 and
Area from (z = 0 to z = ∞)=0.5
AJ
Evaluation of Probability in Normal Distribution :
In the case of normal distribution we have, Z
1 b
2 2
P (a ≤ x ≤ b) = √ e−(x−µ) /2σ dx
σ 2π a
This integral cannot be evaluated by known methods of integration and we have to
employ the technique of numerical integration which becomes tedious.
Hence we think of standardization and the same.
To find any given probability of Normal random variable X, we convert X to Stan-
x−µ
dard Normal variable z using z = σ
a−µ b−µ
Hence P (a ≤ x ≤ b) can be written as P σ
≤z≤ σ
a−µ b−µ
If σ
= z1 and σ
= z2 , then
P (a < X < b)
= P (z1 ≤ z ≤ z2 )
= Area from z = z1 to z = z2 under standard normal curve
= Area from (z = 0 to z = z2 ) − Area from (z = 0 to z = z1 )
(under standard normal curve)
A(z) =
• z= x−µ
σ
IET
= A (z2 ) − A (z1 ) (provided both z1 and z2 are > 0)
For A(z) we use the Standard Normal table which gives Areas under the curve from
0 to z.
0
1 − z2
√ e 2 dz
2π
Z z
Problem 39. If X is a normal variate with mean 80 and S.D 10. Compute P (X ≤
100).
X−µ
Solution : We know Z = σ
100−80
When X = 100, Z = 10
=2
∴ P (X ≤ 100) = P (Z ≤ 2)
= P (−∞ < Z < 0) + P (0 < Z < 2)
= 0.5 + 0.4772( from table )
= 0.9772
Problem 40. If X is normally distributed with mean 6 and standard deviation 5.
IET
Find P(0 ≤ X ≤ 8)
Solution :
P(0 ≤ X ≤ 8) = P(−1.2 ≤ Z ≤ 0.4)
= P(−1.2 ≤ Z ≤ 0) + P(0 ≤ Z ≤ 0.4)
= P(0 ≤ Z ≤ 1.2) + P(0 ≤ Z ≤ 0.4)
(by symmetry)
= 0.3849 + 0.1554 (from the table)
= 0.5403
AJ
Problem 41. When X is normally distributed with mean 12, standard deviation is
4. Find (i) P(X ≥ 20) (ii) P (0 < X < 12) (iii) P (X ≤ 20)
(ii)
P(0 < X < 12) = P(−3 < Z < 0)
= P (0 < Z < 3) = 0.4987
P(X ≤ 20) = P(Z ≤ 2)
= P(−∞ < Z < 0) + P(0 < Z < 2)
IET
= 0.5 + 0.4772 = 0.9772
Problem 42. For the normal distribution with mean 2 and standard deviation 4,
evaluate the following probabilities (i) P (X ≥ 5) (ii) P (|X| < 4) (iii)
P (|X| > 3)
5−2 3
Solution : (i) P (X ≥ 5) when X = 5, z= 4
= 4
= 0.75
P (X ≥ 5) = P (z ≥ 0.75)
= 0.5 − P (0 < z < 0.75)
= 0.5 − 0.2734
= 0.2266
AJ
IET
= Area(−1.5 to 0) + Area(0 to 0.5)
= Area(0 to 1.5) + Area(0 to 0.5)
= 0.4332 + 0.1915 = 0.6247
(iii)
P (|X| > 3) = 1 − P (|X| ≤ 3)
= 1 − P (−3 ≤ X ≤ 3)
= 1 − P (−3 ≤ X ≤ 3)
−3 − µ X −µ 3−µ
=1−P ≤ ≤
AJ
σ σ σ
−3 − 2 3−2
=1−P ≤Z≤
4 4
= 1 − P (−1.25 ≤ Z ≤ 0.25)
= 1 − Area(−1.25 to 0.25)
= 1 − [Area(−1.25 to 0) + Area(0 to 0.25)]
= 1 − [Area(0 to 1.25) + Area(0 to 0.25)]
= 1 − [0.3944 + 0.0987] = 0.5069
Problem 43. In a test on electric bulbs, it was found that the life time of a particular
brand distributed normally with an average life of 2000 hours and S.D. of 60 hours.
If a firm purchases 2500 bulbs find the number of bulbs that are likely to last for
(i) more than 2100 hours (ii) less than 1950 hours (iii) between 1900 to 2100 hours
[VTU: July 2023, Model 2020, June/July 17]
IET
P (x > 2100) = P (z > 1.67)
= P (z ≥ 0) − P (0 < z < 1.67)
= 0.5 − ϕ(1.67)
= 0.5 − 0.4525 = 0.0475
Hence number of bulbs that are likely to last for more than 2100 hours is
IET
Solution : Let x represent the marks of students. By data µ = 70, σ = 5
x−µ x−70
Hence standard Normal variable is z = σ
= 5
(i) If x = 65, z = −1 and we have to find P (z < −1)
P (z < −1) = P (z > 1)
= P (z ≥ 0) − P (0 < z < 1)
= 0.5 − A(1) = 0.5 − 0.3413 = 0.1587
Number of students scoring less than 65 marks = 1000×0.1587 = 158.7 ≈ 159
(ii) If x = 75, z = 1 and we have to find P (z > 1)
P (z > 1) = P (z ≥ 0) − P (0 < z < 1)
= 0.5 − A(1) = 0.5 − 0.3413 = 0.1587
∴ the number of students scoring more than 75 marks = 1000 × 0.1587 =
158.7 ≈ 159
(iii) We have to find P (−1 < z < 1)
p(−1 < z < 1) = 2p(0 < z < 1)
AJ
= 2A(1) = 2(0, 343) = 0.6836
∴ number of students scoring marks between 65 and 75 = 1000 × 0.6826 =
682.6 ≈ 683
Problem 45. In a test on 2000 electric bulbs, it was found that the life of a particular
make was normally distributed with an average life of 2040 hours and SD of 60
hours. Estimate the number of bulbs likely to burn for (i) More than 2150 hours (ii)
Less than 1950 hours (iii) More than 1920 hours but less than 2160 hours. Given
A(1.5) = 0.4332, A(1.83) = 0.4664, A(2) = 0.4772 [VTU: Dec 2018, Dec14/ Jan 15]
IET
Thus number of bulbs likely to burn for more than 2150 hours = 2000 × 0.0336 ≈
67 (ii)P(bulb likely toburn for Less than 1950
hours) is
X −µ 1950 − µ
p(X < 1950) = p <
σ σ
= p(Z < −1.5)
= Area(−∞ to − 1.5)
= Area(1.5 to ∞)
= Area(0 to ∞) − Area(0 to 1.5)
= 0.5 − 0.4332 = 0.0668
Thus number of bulbs likely to burn for Less than 1950 hours = 2000 × 0.0668 ≈
134
(iii) P(bulb likely to burn for more
than 1920 hours but less than 2160 hours) is,
1920 − µ X −µ 2160 − µ
p(1920 < X < 2160) = p < <
AJ
σ σ σ
= p(−2 < z < 2)
= 2 × Area(0 to 2)
= 2 × 0.4772 = 0.9544
Thus number of bulbs likely to burn for more than 1920 hours but less than 2160
hours = 2000 × 0.9544 = 1909
Problem 46. In a normal distribution 31% of items are under 45 and 8% of the items
are over 64. Find the mean and S.D. of the distribution. [VTU: July 2023, Dec/ Jan
2016, Dec 2012, 2009]
Solution : Let µ and σ be the mean and S.D of the normal distribution By data
P (x < 45) = 0.31 and P (x > 64) = 0.08
Now
45 − µ
P (x < 45) = 0.31 ⇒ P z < = 0.31
σ
45 − µ
⇒ Area −∞ to = 0.31
IET
σ
µ − 45
⇒ Area to ∞ = 0.31
σ
µ − 45
⇒ 0.5 − Area 0 to = 0.31
σ
µ − 45
⇒A = 0.5 − 0.31 = 0.19
σ
(But 0.19 ≈ A(0.5))
µ − 45
⇒ = 0.5
σ
⇒ µ − 45 = 0.5σ
⇒ µ − 0.5σ = 45 (∗)
AJ
64 − µ
P (x > 64) = 0.08 ⇒ P z > = 0.08
σ
64 − µ
IET
⇒ Area to ∞ = 0.08
σ
64 − µ
⇒ Area(0 to ∞) − Area 0 to = 0.08
σ
64 − µ
⇒ 0.5 − A = 0.08
σ
64 − µ
⇒A = 0.5 − 0.08 = 0.42
σ
(But 0.42 ≈ A(1.4))
64 − µ
⇒ = 1.4
σ
⇒ 64 − µ = 1.4σ
⇒ µ + 1.4σ = 64 (∗∗)
Now we have two equations with two unknowns. They are
AJ
µ − 0.5σ = 45 (∗)
and
µ + 1.4σ = 64 (∗∗),
Solving these two, we get µ = 50, σ = 10
Problem 47. In an examination, 7% of students score less than 35% marks and 89%
of students score less than 60% marks. Find the mean and standard deviation, if the
marks are normally distributed. It is given that P (0 < z < 1.23) = 0.39 and
P (0 < z < 1.48 = 0.43) [Jan 2020, VTU Jan 2012]
Solution : Let µ and σ be the mean and S.D of the normal distribution.
By data we have P (x < 35) = 0.07, P (x < 60) = 0.89
x−µ
We have z = σ
35−µ
When x = 35, z= σ
= z1 (say)
60−µ
When x = 60, z= σ
= z2 (say)
Hence we have
P (x < 35) = 0.07 ⇒ P (z < z1 ) = 0.07
⇒ Area(−∞ to z1 ) = 0.07
IET
⇒ Area(−z1 to ∞) = 0.07
⇒ 0.5 − A(−z1 ) = 0.07
⇒ A(−z1 ) = 0.5 − 0.07 = 0.43
(But 0.43 ≈ A(1.48))
⇒ −z1 = 1.48
35 − µ
⇒− = 1.48
σ
⇒ µ − 1.48σ = 35 (∗)
P (x < 60) = 0.89 ⇒ P (z < z2 ) = 0.89
⇒ Area(−∞ to z2 ) = 0.89
⇒ Area(−∞ to 0) + Area(0 to z2 ) = 0.89
(Here the area measured from − ∞ to z2 is more than 0.5)
(Hence z2 lies right to y axis)
AJ
⇒ 0.5 + A(Z2 ) = 0.89
⇒ A(z2 ) = 0.89 − 0.5 = 0.39
(But 0.39 ≈ A(1.23))
⇒ z2 = 1.23
60 − µ
⇒ = 1.23
σ
⇒ µ + 1.23σ = 60 (∗∗)
Solving,
µ − 1.48σ = 35 (∗)
and
µ + 1.23σ = 60 (∗∗)
we get µ = 48.65 and σ = 9.23
IET
Random Variables
x −2 −1 0 1 2 3
P (X) 0.1 k 0.2 2k 0.3 k
Find k, Mean, Variance, SD [VTU July 2023, Model 2020, 2004] Ans :
0.1, 0.8, 2.16, 1.47
4. A random variable X takes the values −3, −2, −1, 0, 1, 2, 3 such that
P (X = 0) = P (X < 0) and
P (X = −3) = P (X = −2) = P (X = −1)=
IET
X 0 1 2 3 4 5 6
(i) Find P (X < 4), P (X ≥ 5),
p(X) : k 3k 5k 7k 9k 11k 13k
P (3 < X ≤ 6) [VTU July 2013, 2010]
16 24 33
Ans : , ,
49 49 49
1 (x + 1), −1 < x < 1
2
11. If f (x) = represents the density function of a
0, elsewhere
random variable X, Find E(X) and V ar(X)
IET
(i) If so, determine the probability that the variate having this density will fall
in the interval (1,2).
(ii) Also, find the cumulative probability function F (2) [VTU July 2017]
kx2 0≤x≤3
13. Find the constant k such that the function f (x) = is a
0 elsewhere
p.d.f. Also find (i) P (1 < X < 2) (ii) P (X ≤ 1) (iii) P (X > 1) (iv)
Mean (v) Variance [VTU Jan 2018] 7
Ans : k = 91 , (i) 27 1
, (ii) 27 26
, (iii) 27 , (iv)
9 27
4
, (v) 80
cx2 0≤x≤3
14. Find the constant ‘c’ such that the function p(x) = is a
0 elsewhere
p.d.f. Also find (i) P (1 < X < 2) (ii) P (X ≤ 1) (iii) P (X > 1) [VTU
Model 2020, Jan 2018]
AJ
15. Suppose a random variable X takes the values -3, -1, 2 and 5 with respective
2k−3 k−2 k−1 k+1
probabilities 10
, 10 , 10 , 10 . Find the value of k and (i)
P (−3 < X < 4) (ii) P (X ≤ 2)
[VTU June 2010]
18. A random
variable x has the following density function
IET
kx2 , −3 ≤ x ≤ 3
P (x) = Evaluate k and find (i) P (1 ≤ x ≤ 2) (ii)
0 elsewhere
P (x ≤ 2) (iii) P (x > 1) [VTU Model 2020]
19. If the random variable X takes the values 1, 2, 3 and 4 such that
2P (X = 1) = 3P (X = 2) = P (X = 3) = 5P (X = 4). find the
probability distribution function and cumulative distribution function of X.
20. A coin is tossed twice. A random variable X represent the number of heads
turning up. Find the discrete probability distribution for X. Also find its mean
1
and variance. Ans : Mean = 1 and Variance = 2
IET
0 x, x < 0,
3
Ans : (a)3/2; (b)F (x) = x 2 , 0 ≤ x < 1
1, x ≥ 1
24. The random variable X, representing the number of errors per 100 lines of
software code, has the following probability distribution:
x 2 3 4 5 6
f (x) 0.01 0.25 0.4 0.3 0.04
find the variance of X. Ans : 0.74
1. Derive the expressions for mean and variance of binomial distribution. [VTU
July 2023, Dec 2018, Jan 2018, July 2017, Jan 2015, Dec 2012, June 2012]
3. Obtain the Mean and variance of the Poisson distribution. [VTU :July 2023, Jan
2020, June 2019, July 2017, Dec/ Jan 14]
5. In a certain factory turning out razar blades there is a small probability of 1/500
IET
for any blade to be defective. The blades are supplied in packets of 10. Use
poisson distribution to calculate the approximate number of packets containing
(i) One defective (ii) Two defective, in a consignment of 1000 packets
[VTU:Model 2020, Dec/ Jan 16, 2004]
Ans : 9802, 196, 2
6. The probability that a bomb dropped hits the target is 0.2, find the probability
that out of 6 bombs dropped (i) Exactly two will hit the target (ii) Atleast two
will hit the target [VTU:June/ July 15] Ans : 0.2458, 0.3446
7. A certain screw making machine produces on average two defective out of 100
and packs them in boxes of 500 . Find the probability that the box contains
i) three defective
ii) At least one defective
AJ
iii) between 2 and 4 defective. [VTU: Feb 2021]
Ans : 0.2277.0.0198, 0.057
8. A die is tossed thrice. A success is ‘getting 1 or 6’ on a toss. Find the mean and
variance of the number of successes. [VTU Dec 2011] Ans : Mean = 1,
Variance= 2/3
9. A die is thrown 8 times. Find the probability that 3 falls (i) Exactly two times (ii)
At least once (iii) At the most 7 times [VTU July 2013] Ans : 0.2605, 0.7674,
0.9999 ≈ 1
10. In sampling a large no. of parts manufactured by a machine, the mean number of
defectives in a sample of 20 is 2. Out of 1000 such samples, how many would be
expected to contain atleast 3 defective parts.? [VTU June 2019, 2004] Ans :
323
11. Out of 800 families of 5 children each, how many would you expect to have (a) 3
Boys (b) 5 Girls (c) either 2 or 3 boys ? (d) atmost 2 girls ? Assume equal
probabilities for Boys and Girls.
IET
[VTU August 2022, August 2021, 2004]
Ans :a) 250 , b)25, c)500 d)400
12. In a bombing action, there are 50% chance that any bomb will strike the target.
Two direct hits are required to destroy the target completely. How many bombs
are required to be dropped to give a 99% chance or better of completely
destroying the target? [VTU 2003]
13. If the probability of a bad reaction from certain injection is 0.001, determine the
chance that out of 200 individuals more than two will get a bad reaction. [VTU
July 2023, Jan 2018, 2008] Ans : 0.32
14. The probability that an individual suffers a bad reaction from a certain injection
is 0.001. Using Poisson distribution, determine the probability that out of 2000
individuals: (i) Exactly 3 and (ii) More than 2 , will suffer a bad reaction. [VTU
AJ
July 2023, June 2012]
15. If a random variable has Poisson Distribution such that p(1) = p(2), find (i)
Mean of the Distribution (ii) P (4) [VTU 2003]
16. If 10% of rivets produced by a machine are defective. Find the probability that,
out of 12 such rivets, (i)exactly 2 are defective (ii) at least 2 are defective (iii)
none of them are defective. [VTU July 2013]
17. Given that 2% of fuses manufactured by a firm are defective, find by using
Poisson distribution, the probability that a box containing 200 fuses has (i) no
defective fuses (ii) 3 or more defective fuses (iii) At least one defective fuse
[VTU August 2022,, Dec 2018, Dec 2010] Ans : 0.0183, 0.76189, 0.9817
18. Suppose 300 misprints are randomly distributed throughout a book of 500 pages,
find the probability that a given page contains (i) exactly three misprints (ii) less
than three misprints and (iii) four or more misprints. [VTU Model 2020] Ans :
0.0198, 0.9768, 0.0034
19. If 20% bolts produced by a machine are defective. Calculate the probability that
IET
out of 7 randomly selected bolts not more than 1 is defective, atmost two are
defective? [VTU August 2021]
22. A car hire -firm has two cars which it hires out on a day to day basis. The
number of demands for a car is known to be Poisson distribution with mean 1.5
Find the probability of day on which (i) There is no demand for the car and (ii)
The demand is rejected. Ans : 0.2231, 0.1912
23. The number of accidents in a year to taxi drivers in a city follows a Poisson
distribution with mean 3. Out of 1000 taxi drivers find approximately the
AJ
number of the drivers with“ (i) no accident in a year (ii) more than 3 accidents in
a year. (VTU July 2023)
Ans : 50, 350
24. The probability that a man aged 60 will live up to 70 is 0.65. What is the
probability that out of 10 men, now 60, at least 7 will live to be 70? (VTU July
2023)
Ans : 0.51383
25. If the mean and standard deviation of the number of correctly answered
√
questions in a test given to 4096 students are 2.5 and 1.875. Find an estimate
26. In a quiz contest of answering ’Yes’ or ’No’, what is the probability of guessing
atleast 6 answers correctly out of 10 questions asked? Also find the probability
of the same if there are 4 options for a correct answer? (VTU July 2023)
IET
Ans : 0.377, 0.0197
1
27. The probability that a news reader commits no mistake in reading the news is e3
.
Find the probability that on a particular news broadcast he commits (i) Only 2
mistakes (ii) more than 3 mistakes (iii) atmost 3 mistakes, assuming that
mistakes follow Poisson distribution. (VTU 2023)
Ans : λ = 3, 0.2240, 0.3528, 0.6472
28. The number of telephone lines busy at an instant of time is a binomial variate
with probability 0.1 that a line is busy. If 10 lines are chosen at random, what is
the probability that, (i) no line is busy (ii) all lines are busy (iii) at least one line
is busy (iv) Atmost 2 lines are busy. (VTU 2023)
Ans : 0.3487, (0.1)1 0, 0.6513, 0.9298
AJ
Exponential and Normal Distributions
probability that a a random call made from the booth (i) ends less than 5
minutes (ii)between 5 and 10 minutes? [VTU Model 2020, Jan 2020, Jan
2018]
5. At a certain city bus stop three buses arrive per hour on an average. Assuming
that the time between successive arrivals is exponentially distributed, find the
IET
probability that the time between the arrival of successive buses is (i) less than
10 minutes and (ii) at least 30 minutes Ans: 0.3935, 0.2231
10. X is a normal variate with mean 30 and S.D, 5, find the probabilities that (i)
26 ≤ X ≤ 40 (ii) X ≥ 45 and (iii) |X − 30| > 5 [VTU August 2022,
June 2019]
Ans : 0.7653, 0.0014, 0.3174
11. If X is a normal variate with mean 80 and S.D 10. Compute P (X ≤ 100).
Ans : 0.9772
12. In a test on electric bulbs, it was found that the life time of a particular brand
distributed normally with an average life of 2000 hours and S.D. of 60 hours if
a firm purchases 2500 bulbs had the member of bulbs that are likely to last for
(i) more than 2100 hours (ii) less than 1950 hours (iii) between 1900 to 2100
hours [VTU Model 2020, June/July 17]
IET
13. In a normal distribution 31% of items are under 45 and 8% of the items are
over 64. Find the mean and S.D. of the distribution. [VTU: Dec/ Jan 2016,
Dec 2012, 2009] Ans : 50 and 10
14. In a test on 2000 electric bulbs, it was found that the life of a particular make
was normally distributed with an average life of 2040 hours and SD of 60
hours. Estimate the number of bulbs likely to burn for (i) More than 2150
hours (ii) Less than 1950 hours (iii) More than 1920 hours but less than 2160
hours. Given A(1.5) = 0.4332, A(1.83) = 0.4664, A(2) = 0.4772 [VTU: Dec
2018, Dec14/ Jan 15]
Ans : 0.0336 ∗ 2000 = 67, 0.0918 ∗ 2000 = 184,
0.9544 ∗ 2000 = 1909
15. The life of an electric bulb is normally distributed with mean life of 200 hours
AJ
and S.D. of 60 hours. Out of 2500 bulbs, find the number of bulbs which are
likely to last between 1900 and 2100 hours. Given that
P (0 < Z < 1.67) = 0.4525 [VTU Dec 2018, July 2017, Jan 2014]
18. In an examination, 7% of students score less than 35% marks and 89% of
students score less than 60% marks. Find the mean and standard deviation, if
IET
the marks are normally distributed. It is given that
P (0 < z < 1.2263) = 0.39 and P (0 < z < 1.4757 = 0.43) [Jan
2020, VTU Jan 2012]
19. The weekly wages of workers in a company are normally distributed with
mean of Rs. 700/- and standard deviation of Rs. 50. Find the probability that
the weekly wage of a randomly chosen worker is (i) Rs. 650 and Rs. 750 (ii)
More than Rs 750 [VTU June 2012]
20. A sample of 100 dry battery cells tested to find the length of life produced by a
company and following results are recorded : Mean life = 12 hours, Standard
Deviation = 3 hours. Assuming data to be normally distributed, find the
expected life of a dry cell : (i) more than 15 hours (ii) Between 10 and 1 hours
[VTU Jan 2011]
AJ
21. Suppose that the student IQ scores form a normal distribution with mean 100
and standard deviation 20. Find the percentage of students whose (i) score is
less than 80 (ii) score falls between 90 and 140 (iii) score is more than 120
[VTU June 2011]
IET
Joint probability distribution & Markov
Chain
Syllabus :
Joint probability distribution: Joint Probability distribution for two discrete ran-
dom variables, expectation, covariance and correlation.
Markov Chain: Introduction to Stochastic Process, Probability Vectors, Stochastic
matrices, Regular stochastic matrices, Markov chains, Higher transition probabili-
ties, Stationary distribution of Regular Markov chains and absorbing states.
AJ
2.1 Joint probability and Joint probability distribution
• If X and Y are two discrete random variables, we define the joint probability
function of X and Y by P (X = x, Y = y) = f (x, y) (or p(x, y)),
where f (x, y) satisfy the conditions
X
(ii) p(x, y) = 1
x,y
80
Lecture Notes - BCS301 : Mathematics for Computer Science - Module 1 : Probability Distributions Page 81
IET
x2 p (x2 , y1 ) p (x2 , y2 ) ... p (x2 , yn ) P (x2 )
x3 p (x3 , y1 ) p (x3 , y2 ) ... p (x3 , yn ) P (x3 )
· · · · ·
· · · · ·
· · · · ·
xm p (xm , y1 ) p (xm , y2 ) . . . p (xm , yn ) P (xm )
P (y1 )
Marginal Density of Y P (y2 ) ... P (yn ) Sum=1
From joint pmf of X and Y we can find the pmf (or pdf ) of X without Y. This is
called a marginal pmf (or pdf ).
In the table each row represents a single value of X and each column represents a
single value of Y.
Two random Variables are said to be stochastically independent if they satisfy the
condition
p (xi , yj ) = p (xi ) p (yj )
IET
the relation X
µx = E(X) = xp(x)
The Variance of X denoted by V (X) is defined by the relation
Xn
2
(xi − µ)2 p (xi ) = E X 2 − (E(X))2
V (X) or σX =
i=1 p
Where µ is the mean of x, σx = V (X) is called the standard deviation (S.D) of
X.
If X and Y are random variables having the joint probability function p(x, y), then
the expectation of X
Xand Y are defined as follows :
E(X) or µX = xp(x),
X
E(Y ) or µY = yp(y) and
X
E(XY ) = x y p(x, y)
The covariance of XX and
XY denoted by COV (X, Y ) is defined by the relation,
AJ
COV(X, Y) = (xi − µx ) (yj − µy ) p(xi , yj )
i j
= E [(X − µX ) (Y − µY )]
COV(X, Y ) = E(XY ) − µX µY (Equivalently)
Further, the coefficient of correlation of X and Y denoted by ρ(X, Y) is defined by
the relation
COV(X, Y )
ρ(X, Y ) =
σx σy
Problem 48. The joint distribution of two random variables X and Y is as follows.
X\Y −4 2 7
1 1 1
1 8 4 8
1 1 1
5 4 8 8
Determine (i) Marginal Distributions of X and Y (ii) Covariance of X and Y (iii) σx
and σy (iv)Correlation of X and Y [VTU Dec 2018, July 2017]
IET
8 4 8
1 1 3
p(Y = 2) = + =
4 8 8
1 1 1
p(Y = 7) = + =
8 8 4
Hence distributions of X and Y are given by
xi 1 5 yj -4 2 7
p (xi ) 12 12 p (yj ) 38 38 14
(ii) X
E(X) = xi p (xi ) = (1)(1/2) + (5)(1/2) = 3
X
E(Y) = yj p (yj ) = (−4)(3/8) + (2)(3/8) + (7)(1/4) = 1
Thus
µX = E(X) = 3 and µy = E(Y ) = 1
E(XY) = Σxi yj P(xi , yj )
= (1)(−4)(1/8) + (1)(2)(1/4) + (1)(7)(1/8)
AJ
+ (5)(−4)(1/4) + (5)(2)(1/8) + (5)(7)(1/8)
1 1 7 5 35 3
=− + + −5+ + =
2 2 8 4 8 2
COV(X, Y) = E(XY) − µX µY
3
= (3/2) − (3)(1) = −
2
(iii)
2
= E X2 − µ2X and
σX
σY2 = E Y2 − µ2Y
Now X
2
x2i p (xi ) = (1)(1/2) + (25)(1/2) = 13
E X =
79
E Y2 = Σyj2 f (yj ) = (16)(3/8) + (4)(3/8) + (49)(1/4) =
4
Hence
2 75
σX = 13 − (3)2 = 4, σY2 = (79/4) − (1)2 =
q 4
75
Thus σX = 2 and σY = 4
= 4.33
(iv)
COV (X, Y )
ρ(X, Y) =
σX σY
IET
−3/2
= p
(2) 75/4
−3
= √ = −0.1732
2 75
Thus p(X, Y) = −0.1732
Problem 49. The joint distribution of two random variables X and Y is as follows.
X\Y 1 3 9
1 1 1
2 8 24 12
1 1
4 4 4
0
1 1 1
6 8 24 12
Determine (i) Marginal Distributions of X and Y (ii) Covariance of X and Y [VTU
Dec 2018, June 2011]
1 1 1 1 1 1 1
p(y = 3) = + + = , p(y = 9) = +0+ =
24 4 24 3 12 12 6
IET
X
E(X) = xi p (xi ) = (2)(1/4) + (4)(1/2) + (6)(1/4) = 4
i
X
E(Y ) = yj p (yj ) = (1)(1/2) + (3)(1/3) + (9)(1/6) = 3
j
X
E(XY ) = xi yj P (xi , yj )
i,j
Problem 50. X and Y are independent random variables. X take values, 2, 5, 7 with
probability 21 , 14 and 14 , respectively. Y takes values 3, 4, 5 with the probability
1 1 1
, ,
3 3 3
(a) Find the joint probability distribution of X and Y.
(b) Show that the covariance of X and Y is equal to zero.
AJ
(c) Find the probability distribution of Z = X + Y
Hence
1 1 1
p(x = 2, y = 3) = p(x = 2) × p(y = 3) = × =
2 3 6
1 1 1
p(x = 2, y = 4) = p(x = 2) × p(y = 4) = × =
2 3 6
1 1 1
p(x = 2, y = 5) = p(x = 2) × p(y = 5) = × =
2 3 6
1 1 1
IET
p(x = 5, y = 3) = p(x = 5) × p(y = 3) = × =
4 3 12
1 1 1
p(x = 5, y = 4) = p(x = 5) × p(y = 4) = × =
4 3 12
1 1 1
p(x = 5, y = 5) = p(x = 5) × p(y = 5) = × =
4 3 12
1 1 1
p(x = 7, y = 3) = p(x = 7) × p(y = 3) = × =
4 3 12
1 1 1
p(x = 7, y = 4) = p(x = 7) × p(y = 4) = × =
4 3 12
1 1 1
p(x = 7, y = 5) = p(x = 7) × p(y = 5) = × =
4 3 12
The joint distribution table is as follows.
X\Y 3 4 5 p (xi )
1 1 1 1
2 6 6 6 2
1 1 1 1
5 12 12 12 4
AJ
1 1 1 1
7 12 12 12 4
p (yj ) 13 13 13 1
IET
X
E(XY) = xi yj P(xi ,yj )
i
= (2)(3)(1/6) + (2)(4)(1/6) + (2)(5)(1/6)
+ (5)(3)(1/12) + (5)(4)(1/12) + (5)(5)(1/12)
+ (7)(3)(1/12) + (7)(4)(1/12) + (7)(5)(1/12)
=16
Hence
COV (X, Y) = E(XY) − µX µY = 16 − (4)(4) = 0
(c) Let Z = X + Y
i.e. zi = xi + yi and hence {zi } = {5, 6, 7, 8, 9, 10, 11, 12}
The corresponding probabilities are,
Z 5 6 7 8 9 10 11 12
P (Z) 16 16 16 1
12
1
12
1
6
1
12
1
12
Problem 51. X and Y are independent random variables. X takes the values 1, 2
AJ
with probability 0.7, 0.3 each and y takes the values −2, 5, 8 with probabilities
0.3, 0.5, 0.2. Find the joint distribution of X and Y . Hence find COV (X, Y ).
[VTU Jan 2018]
xi 1 2
p (xi ) 0.7 0.3
yj -2 5 8
p (yj ) 0.3 0.5 0.2
Since X and Y are independent, the joint probabilities p(xi , yj ) is obtained by us-
ing p(xi , yj ) = p (xi ) × p (yj )
X\Y -2 5 8 p (xi )
IET
We have COV(X,
µX = E(X) =
µY = E(Y ) =
X
E(XY ) =
i
1
2
0.21 0.35 0.14
Y ) = E(XY ) − µX µY where
xi p (xi ) = (1)(0.7) + (2)(0.3) = 1.3
0.7
0.09 0.15 0.06 0 .3
1
i,j
xi yj P(xi ,yj )
X\Y 0 1 2 3 Sum
0 0 k 2k 3k 6k
1 2k 3k 4k 5k 14k
2 4k 5k 6k 7k 22k
Sum 6k 9k 12k 15k 42k
IET
P 1
i,j p(xi , yj ) = 1 ⇒ 42k = 1 ∴ k = 42
(b) Marginal probability distribution of X and Y are given by
xi 0 1 2
p (xi ) 17 13 11
21
yj 0 1 2 3
p (yj ) 17 3
14
2
7
5
14
IET
= =
42 7
Problem 53. A fair coin is tossed thrice. The random variables X and Y are defined
as follows: X = 0, 1 according as head or tail occurs on the first toss, y= no. of
heads.
(i) Determine the marginal probability distribution of X and Y (ii)Determine the joint
distribution of X and Y (iii) Determine E(X), E(Y) and E(XY) (iv) Determine σx , σy
[VTU]
Solution :
The sample space S and the association of random variables X and Y is given by
the following table.
xi 0 1 yj 0 1 2 3
p (xi ) 12 12 p (yi ) 18 38 38 18
(X = 0 implies that there is a head turn out and Y = 0 implies that there the total number he
which is impossible event).
P (X = 0, Y = 1) = 18 (corresponding to the outcome HT T )
P (X = 0, Y = 2) = 82 = 14 (out comes are HHT and HT H)
P (X = 0, Y = 3) = 18 ; (outcome is HHH)
P (X = 1, Y = 0) = 18 ; (outcome is T T T )
P (X = 1, Y = 1) = 82 = 14 ; ( out comes are T HT, T T H)
IET
P (X, = 1, Y = 2) = 81 ; ( outcome is T HH)
P (X = 1, Y = 3) = 0; (since the outcome is impossible).
(The above values can be written by looking at the table of S, X, Y )
The required joint probability distribution of X and Y is as follows.
µX = E(X) =
X\Y
0
1
0 1 2 3 Sum of row entries
0 18 14 18
1 1 1
8 4 8
0
Sum of column entries 18 38 38 18
X
xi p (xi )
1
2
1
2
1
X
2
= E X 2 − µ2X = x2i p (xi ) − µ2X
σX
= (0)2 (1/2) + (1)2 (1/2) − 1/4 = 1/4
1
⇒ σX =
2
σY2 = E Y 2 − µ2Y
IET
= 3/4
√
3
⇒ σY =
2
1 2 1
COV(X, Y ) = E(XY ) − µX µY = − =−
2 4 4
COV(X, Y ) − 41 1
ρ(X, Y ) = = √ = −√
σX σY 3 3
4
Problem 54. The joint distribution of two random variables X and Y is as follows.
X / Y -2 -1 4 5
1 0.1 0.2 0 0.3
2 0.2 0.1 0.1 0
Determine (i) Marginal Distributions of X and Y (ii) Covariance of X and Y (iii)
Correlation of X and Y (iv) Find p(X + Y > 0) [VTU Jan 2020,June 2019, Dec
AJ
2010]
(ii) XX
E(XY ) = xyp(x, y)
x y
IET
∴ COV (X, Y ) = E(XY ) − µX µY = 0.9 − 1.4 × 1 = −0.5
(iii) X
E X2 = x2 px (x) = 12 × 0.6 + 22 × 0.4 = 2.2
x
2
= E X 2 − µ2x = 2.2 − (1.4)2 = 0.24
∴ σX
⇒ σx = 0.49
X
E Y2 = y 2 fY (y)
y
⇒ σY = 3.1
Now,
Cov(X, Y )
ρ(X, Y ) =
σX σY
−0.5
= = −0.329
0.49 × 3.1
AJ
(v) X+Y > 0 is possible when (X, Y ) take the values (1, 4); (1, 5); (2, −1); (2, 4)
and (2, 5)
Hence
P (X + Y > 0) = p(1, 4) + p(1, 5) + p(2, −1) + p(2, 4) + p(2, 5)
= 0 + 0.3 + 0.1 + 0.1 + 0
= 0.5
Problem 55. A fair coin is tossed 4 times. Let X denote the no. of heads occurring
and let Y denote the longest string of heads occurring. Find (i) the joint distribution
of X and Y (ii) Marginal distributions of X and Y (iii)COV(X,Y) [VTU June 2011]
Solution : When a fair coin is tossed four times, the set of all possible outcomes
S = {T T T T, T HT T, HT T T, T T HT, T T T H, T T HH, HT T H, T HHT,
T HT H, HHT T, HT HT, T HHH, HT HH, HHT H, HHHT, HHHH}
IET
f (2, 0) = 0
3
f (2, 1) = P [T HT H, HT HT, HT T H] = 16
3
f (2, 2) = P [T T HH, T HHT, HHT T ] = 16
f (2, 3) = f (2, 4) = 0, f (3, 0) = f (3, 1) = 0
2
f (3, 2) = P [HT HH, HHT H] = 16
2
f (3, 3) = P [T HHH, HHHT ] = 16
f (3, 4) = 0
f (4, 0) = f (4, 1) = f (4, 2) = f (4, 3) = 0
1
f (4, 4) = P [HHHH] = 16
The joint distribution table is,
X/Y 0 1 2 3 4 Row Sum
1 1
0 16
0 0 0 0 16
4 4
1 0 16 0 0 0 16
3 3 6
2 0 16 16
0 0 16
2 2 4
3 0 0 16 16
0 16
1 1
AJ
4 0 0 0 0 16 16
1 7 5 2 1
Column Sum 16 16 16 16 16
⃝1
(b) The marginal distribution of X is
X 0 1 2 3 4
1 4 6 4 1
fX (x) 16 16 16 16 16
Y 0 1 2 3 4
1 7 5 2 1
fY (y) 16 16 16 16 16
X
µx = E(X) = xfX (x)
x
1 4 6 4 1
=0× +1× +2× +3× +4×
16 16 16 16 16
=2
X
µY = E(Y ) = yfY (y)
y
IET
1 7 5 2 1
=0× +1× +2× +3× +4×
16 16 16 16 16
27
=
16X X
Now, E(XY ) = xyf (x, y)
x y
1 4
=0×0× +1×1×
16 16
3 3
+2×1× +2×2× +
16 16
2 2 1
+3×2× +3×3× +4×4×
16 16 16
17
=
4
17 27
∴ Cov(X, Y ) = E(XY ) − µX µY = −2× = 0.875
4 16
AJ
2.4 Stochastic process
Let S be the sample space, representing the set of all possible outcomes of a random
experiment, and R be the set of all real numbers. A random variable X is a function
f from S to R, denoted as X = f (s), where s ∈ S. We introduce an index set
T ⊂ R, indexed by the parameter t representing time.
Here, X0 = X(0) is the value of the stochastic process at the initial time point,
referred to as the initial state of the system.
Example : Let’s consider a simple example of the daily temperature in a city. Each
day, the temperature can be considered a random variable, and the entire sequence
of daily temperatures forms a stochastic process.
Let’s denote the daily temperature as X(t), where t is the day of the year. The index
IET
set T is the set of all days in a year. Mathematically, we can represent this stochastic
process as {X(t), t ∈ T }, where each X(t) is the temperature on day t. The
initial state of the system, X0 , corresponds to the temperature on the first day of the
year.
This stochastic process captures the idea that daily temperatures vary randomly through-
out the year. The randomness is introduced by factors such as weather patterns, sea-
sonality, and other atmospheric conditions.
Classification of Stochastic Processes :
2.5 Vector:
(b) v = 21 , 0, 13 , 16 , 16
1 1 1 1
(c) w = 12 , 2, 6, 4
(d) r = 31 , 0, 0, 16 , 12
Here, (a) u is not a probability vector since its third component is negative.
(b) v is not a probability vector since the sum of the components is not equal to 1 .
(c) w is a probability vector since the components are non negative and their sum is
1.
(d) r is also a probability vector since the components are non negative and their sum
is 1 .
2.7
P=
IET
Stochastic Matrix:
A regular stochastic matrix is a stochastic matrix that has a power (positive integer
power) with all positive entries. Specifically, a stochastic matrix P is regular if there
exists a positive integer k such that Pk has all positive entries, different from zero.
Note : A stochastic matrix P is not regular if a 1 occurs in the principal main diago-
IET
nal. " # " #" # " #
1 1
0 1 0 1 0 1
Example: A = 1 1 Then A2 = 1 1 1 1 = 21 23 ∴ A is a
2 2 2 2 2 2 4 4
regular stochastic matrix (k = 2)
0 0 1
1
(b)Consider A = 0 12
2
0 1 0
Here,
0 1 0
A2 = 0 21 12
1
0 12
, 2
1 1
0
3 21 1 12
A = 4 2 4
0 21 12
,
0 21 12
AJ
A4 = 14 14 12
1 1 1
, 4 2 4
1 1 1
4 2 4
A5 = 1 1 3
.
8 2 8
1 1 1
4 4 2
All elements in A5 are positive. Hence A is regular.
The following properties are associated with a regular stochastic matrix P of order
n.
P
(ii) P has a unique fixed probability vector v such that vP = v and vi = 1.
(iii) P 2 , P 3 , . . . approaches the matrix V whose rows are each the fixed probabil-
ity vector v.
(iv) If u is any probability vector, then the sequence of vectors uP, uP 2 , . . . ap-
proaches the unique fixed probability vector v.
IET
Problem 57. Find the unique fixed probability vector for the regular stochastic ma-
trix
0 1 0
A = 1/6 1/2 1/3
0 2/3 1/3
provides the information required for a probability vector and ensures a unique and
non-trivial solution.
Please select any two equations from (1), (2), and (3) and solve them together with
(4).
For example, by solving (1), (2), and (4), we find the solution:
1 6 3
v1 = , v2 = , v3 =
10 10 10
IET
To confirm the accuracy of this solution, substitute these values into equation (3) and
check if it holds true. Thus, the required unique fixed probability vector v is given
1 6 3
by v = (v1 , v2 , v3 ) = 10 , 10 , 10 .
0 1 0
Problem 58. Show that P = 0 0 1 ia a regular stochastic matrix and find the
1 1
2 2
0
corresponding unique fixed probability vector.
IET
⇒ = v1 , v1 + = v2 , v2 = v3
2 2
Rearranging, we get
v3
−v1 + 0v2 + =0 (1)
2
v3
v1 − v2 + =0 (2)
2
0v1 + v2 − v3 = 0 (3)
v1 + v2 + v3 = 1 (4)
by solving (1), (2), and (4), we find the solution:
1 2 2
v1 = , v2 = , v3 =
5 5 5
1 2 2
Thus [v1 , v2 , v3 ] = 5 , 5 , 5 is the required unique fixed probability vector of P .
Problem 59. Find the unique fixed probability vector for the regular stochastic ma-
trix,
0 34 1
4
A = 21 12 0
AJ
0 1 0
IET
Rearranging, we get
3 1
−v1 + v2 + v3 = 0 (1)
4 4
1 1
v1 − v2 + 0v3 = 0 (2)
2 2
0v1 + 1v2 − v3 = 0 (3)
v1 + v2 + v3 = 1 (4)
When solving the system of equations, please note that all four equations (equations
(1), (2), (3), and (4)) are essential for obtaining a meaningful solution. Equation (4)
provides the information required for a probability vector and ensures a unique and
non-trivial solution.
Select any two equations from (1), (2), and (3) and solve them together with (4).
For example, by solving (2), (3), and (4), we find the solution:
v1 = 14 , v2 = 12 , and v3 = 21 .
To confirm the accuracy of this solution, substitute these values into equation (1) and
check if it holds true. Thus, the required unique fixed probability vector v is given
AJ
by
1 1 1
v= , ,
4 2 2
.
IET
I = {a1 , a2 , . . . am }. i.e. X0 , X1 , X2 , . . . is a sequence of random variables and
a1 , a2 , . . . am are the possible values of these random variables. When Xn = ai ,
we say that the system is in state ai at time n, or at the nth step. If the prob-
ability that Xn+1 = aj given Xn = ai is independent of the states X0 , X1 ,
X2 , . . . , Xn−1 then we say that the stochastic process {X0 , X1 , X2 , . . .} is a
Markov(memoryless) Chain. i.e., in a Markov Chain the future outcomes depend
only on the present outcome and is independent of the past outcomes.
In a Markov chain, the system undergoes transitions from one state to another, and
the probability of transitioning to any particular state depends solely on the current
state, not on the sequence of events that preceded it. This property is known as the
Markov property. Associated with each ordered pair of states (ai , aj ), the number
pij gives the probability that system changes from i th state to j th state. i.e. Let
AJ
pij = P {Xn+1 = aj | Xn = ai }. In other words, pij is the probability that
aj occurs immediately after ai occurs. The numbers pij are known as transition
probabilities.
The transition probabilities pij satisfy pij ≥ 0 and m
P
j=1 pij = 1, for i =
1, 2, . . . , m
i.e. when i = 1, we have p11 + p12 + · · · p1m = 1
when i = 2, we have p21 + p22 + · · · p2m = 1
..
.
when i = m, we have pm1 + pm2 + · · · pmm = 1
These probabilities may be arranged in the matrix form, P which is the square matrix
IET
Here, the i th row of P namely (pi1 , pi2 , . . . , pim ) represents the probabilities of
that system will change from ai to a1 , a2 , a3 , . . . , am respectively. P is called the
transition probability matrix (t.p.m.) of the Markov Chain.
A Markov Chain is said to be irreducible if every state can be reached from every
(n)
other state, i.e., for any two states ai and aj , we have pij > 0 for some n ≥ 1.
When the transition matrix of the Markov Chain is regular, all elements of some
power of P are positive. Hence the Markov Chain is irreducible when P is regular.
AJ
2.14 n-step transition probabilities
If P is the transition matrix of a Markov chain and if the probability that a Markov
chain will move from state i to state j in exactly n steps is denoted by pij (n)
(n)
or pij then the n-step transition matrix P (n) is equal to the nth power of P , i.e.,
P (n) = P n .
In other words, the problem of finding the n-step transition probabilities is reduced
to one of forming powers of a given matrix.
Probability distribution of the system at some arbitrary time is denoted by the prob-
ability vector.
p = (p1 , p2 , . . . , pm ) = (p(a1 ), p(a2 ), p(a3 ), . . . , p(am ))
IET
2.15
p(2) = p(1) P = p(0) P P = p(0) P 2
p(3) = p(2) P = p(0) P 2 P = p(0) P 3 etc.
Similarly, the nth step probability distribution (i.e., the distribution after the first n-
steps) is denoted by
p(n) = p(0) P n = · · · p(n−1) P
Let P be a regular transition matrix of a Markov chain. Then in the long run, the
probability that any state aj occurs is approximately equal to the component vj of
the unique fixed probability vector v of P .
In other words, Stationary distribution of a Markov chain is the unique fixed
probability vector v of the regular transition matrix P of the Markov chain because
every sequence of probability distributions approaches v.
AJ
2.16 Absorbing States
IET
Consider P = 36 3 0 3 3 0 3 = 36 9 21 .6
3 3 0 3 3 0 9 12 15
2
Since all the entries in P are positive we conclude that the t.p.m P is regular. Hence
the Markov chain having t.p.m P is irreducible.
Next we shall find the fixed probability vector of P . If v = (v1 , v2 , z) we shall
find v such that vP = v where v1 + v2 + v3 = 1.
0 4 2
That is [v1 , v2 , v3 ] · 16 3 0 3 = [v1 , v2 , v3 ] ⇒ 1
[3v2 + 3v3 , 4v1 +
6
3 3 0
3v3 , 2v1 + 3v2 ] = [v1 , v2 , v3 ] ⇒ 3v2 + 3v3 = 6v1 ; 4v1 + 3v3 = 6v2 ; 2v1 +
3v2 = 6v3 Solving thesebyusing v1 + v2 + v3 = 1 we obtain v1 = 1/3, v2 =
10/27, v3 = 8/27
Thus v = (1/3, 10/27, 8/27) is the required stationary probability vector.
Problem 61. Every year, a man trades his car for a new car. If he has a Maruti,
he trades it for an Ambassador. If he has an Ambassador, he trades it for a Santro.
AJ
However, if he has a Santro, he is just as likely to trade it for a new Santro as to
trade it for a Maruti or an Ambassador. In 2000 he bought his first car, which was a
Santro.
Find the probability that he has
(i) 2002 Santro
(ii) 2002 Maruti
(iii) 2003 Ambassador
(iv) 2003 Santro
(ii) In the long run, how often will he have a Santro.
IET
in 2000 (his first purchase).
In 2001(after one step(year)), let us find p(1)
0 1 0
p(1) = p(0) P = (0 0 1) 0 0 1
1 1 1
3 3 3
1 1 1
=
3 3 3
In 2002((after two steps), let us find p(2)
0 1 0
(2) (1) 1 1 1
p =p P = 0 0 1
3 3 3 1 1 1
3 3 3
1 4 4
=
9 9 9
(M ) (A) (S)
Hence (i) Probability that he has 2002 Santro = 49
AJ
(ii) Probability that he has 2002 Maruthi = 91
In 2003, let us find p(3)
0 1 0
1 4 4
p(3) = p(2) P = 0 0 1
9 9 9 1 1 1
3 3 3
4 7 16
=
27 27 27
(M ) (A) (S)
7
Hence (iii) Probability that he has 2003 Ambassador = 27
(iv) Probability that he has 2003 Santro = 16
27
(v) To discover what happens in the long run, we must find a fixed probability vector
IET
v1 + v2 + v3 = 1
or
1
v1 (0) + v2 (0) + v3 = v1
3
1
v1 + v2 (0) + v3 = v2
3
1
v1 (0) + v2 + v3 = v3
3
v1 + v2 + v3 = 1
Rearranging, we get
1
−v1 + (0)v2 + v3 = 0 (1)
3
1
v1 − v2 + v3 = 0 (2)
3
2
(0)v1 + v2 − v3 = 0 (3)
3
v1 + v2 + v3 = 1 (4)
AJ
When solving the system of equations, please note that all four equations (equations
(1), (2), (3), and (4)) are essential for obtaining a meaningful solution. Equation (4)
provides information required for a probability vector and ensures a unique and non-
trivial solution. Hence select any two equations from (1), (2) and (3) and solve them
together with (4), and then verify the answer using the equation, which is not used.
For example, by solving (1), (2), and (4), we find the solution:
1 1 3 1
v1 = , v2 = , v3 = =
6 3 6 2
To confirm the accuracy of this solution, substitute these values into equation (3) and
check if it holds true.
Thus
1 1 1
v = (v1 , v2 , v3 ) = , ,
6 3 2
(M ) (A) (S)
IET
Solution : Here, the state space is I = {A has the ball, B has the ball, C has the ball}
and the associated t.p.m. is as follows.
(A) (B) (C)
(A) 0 1 0
P = (B) 0 0 1
1 1
(C) 2 2
0
Initially if C has the ball, the associated initial probability vector is given by
p(0) = ( 0 , 0 , 1 )
(A) (B) (C)
Since the probabilities are desired after three throws we have to find p(3) = p(0) P 3
(or we can also find p(1) = p(0) P , p(2) = p(1) P and then p(3) = p(2) P )
By finding the powers of P , we get
1 1
0
2 2
P 3 = 0 12 12
1 1 1
4 4 2
∴ p(3) = p (0)
P 3
AJ
1 1
2 2
0
1 1
= (0 0 1) 0
2 2
1 1 1
4 4 2
1 1 1
= , ,
4 4 2
(A) (B) (C)
Thus after three throws the probability that the ball is with A is 14 , with B is 1
4
and
with C is 12 .
Problem 63. A gambler’s luck follows a pattern such that if he wins a game, the
probability of winning the next game is 0.6, and if he loses a game, the probability of
losing the next game is 0.7. There is an even chance that the gambler wins the first
game. What is the probability that he wins (i) the second game, (ii) the third game
(iii) In the long run, how aften he will win?
Solution : State space is : I = { Win (W ), Lose (L)} and the associated transition
probability matrix is as follows.
(W ) (L) " #
1 6 4
P = (W ) 0.6 0.4 =
10 3 7
(L) 0.3 0.7
Given that Probability of winning the first game is 21 .
Hence Initial State(first game) probability vector is
IET
1 1
p(0) = ( , )
2 2
(i) Now for the second game, " #
1 1 0.6 0.4
p(1) = p(0) P = ( , )
2 2 0.3 0.7
" #
1 1 6 4
= [1, 1] ·
2 10 3 7
1
= [9, 11]
20
Hence
9 11
p(1) = [ ,
20 20
(W ) (L)
9
Thus the probability of he winning the second game is 20 .
(ii) For the third game, " #
1 1 6 4
p(2) = p(1) P = [9, 11] ·
20 10 3 7
AJ
1
= [87, 113]
200
Hence
87 113
p(2) = [ , ]
200 200
(W ) (L)
87
Thus the probability of winning the third game is 200 . (iii) In the long run, we shall
find the fixed probability vector
v = (v1 , v2 ) such that vP = v where v1 + v2 = 1
" #
1 6 4
That i.e. [v1 , v2 ] 10 = [v1 , v2 ]
3 7
and v1 + v2 = 1
6v1 + 3v2 = 10v1
⇒ 4v1 + 7v2 = 10v2
v1 + v2 = 1
or
IET
−4v1 + 3v2 = 0 (1)
4v1 − 3v2 = 0 (2)
v1 + v2 = 1 (3)
Select any one equation from (1) and (2), and solve it with (3).
For example, by solving (1) and (43), we find the solution:
3 4
v1 = and v2 =
7 7
To confirm the accuracy of this solution, substitute these values into equation (2) and
check if it holds true.
Hence
3 4
v= ,
7 7
(W ) (L)
3
Thus in the long run he wins 7
= 42.857% of the time.
Problem 64. The following figure shows four compartments with door leading from
one to another. A mouse in any compartment is equally likely to pass through each of
AJ
the doors of the compartment. Find the transition matrix of the Markov chain. Draw
the transition diagram.
Solution : The 4 rooms are considered as four states say 1, 2, 3, 4. Since mouse
is moving, it does not stay in the same room. From room 1 it can go to 4 or 2 with
IET
The transition diagram is
AJ
Problem 65. Suppose an urn A contains 2 white marbles and urn B contains 4 red
marbles. At each step of the process, a marble is selected at random from each urn
and the two marbles selected are interchanged. Let Xn denote the number of red
marbles in urn A after n interchanges.
(i) Find the transition matrix P .
(ii) What is the probability that there are 2 red marbles in urn A after 3 steps.
(iii) In the long run, what is the probability that there are 2 red marbles in urn A.
(iv) What is the stationary distribution of the system.
Solution: Here, the state space is the no. of red marbles in urn A.
There are three states I = 0, 1, 2.
Since the number of marbles in the urn A is always 2 the possibilities can be repre-
sented by the following figure.
IET
(i) Transition matrix : If the system is in the state a0 , then a white marble from
A and a red from B must be selected and interchanged, so that the system will
now move to state a1 . Accordingly the first row of the transition matrix (T.M.) is
(0, 1, 0).
Now suppose the system is in a1 . It can move to state a0 , iff red from A and white
from B with probability 21 · 14 = 18 . Thus p10 = 18 .
The system can move from a1 to a2 , iff white from A and red from B with proba-
AJ
bility 12 · 43 = 83 i.e., p12 = 38 ,
Finally, suppose the system is in state a2 . Note that the system can never move
from state a2 to a0 . However, it may remain in a2 itself, if a red from A and red
from B is chosen. In this case the probability is 11 · 42 = 21 .
Lastly, if a red from A and white from B is chosen, then system moves from a2 to
a1 with probability 24 = 21 . Thus third row of the T.M. is 0, 21 , 12 .
The Transition
Matrix
is Transition matrix P :
0 1 0
1 1 3
P = 8 2 8
0 12 12
IET
(ii) The system starts in state a0 , so that p(0)
(1)
(2)
(3)
(0)
(1)
p =p P = 1 0 0 8 2 8 = 0 1 0
p =p P = 0 1 0 8 2 8 = 8 2 8
(2)
1 1 3
= (1, 0,
0 1 0
1 1 3
0 12 12
0 1
1 1 3
0 2 2
0 1 0
1 1
1 1 3
p = p P = 8 2 8 8 2 8 = 16 16 16
0 12 12
0
0) is the initial state. Now
1
1 1 3
9 6
Probability that there are two red marbles in A(i.e., in state a2 ) after three steps is
6
= 38 . (iii) To study the system in the long run, we should find a unique fixed
16
probability vector v = (v1 , v2 , v3 ) of the transition matrix P .
Let v be (v1 , v2 , v3 ) = (v1 , v2 , 1 − v1 − v2 ) (∵ v1 + v2 + v3 = 1). Then
vP = v
AJ
0 1 0
1 1 3
v1 v2 v3 8 2 8 = v1 v2 v3
0 12 12
Solving
1
v2 = v1 or v2 = 8v1
8
1 1
v1 + v2 + v3 = v2 or 2v1 − v2 + v3 = 0
2 2
3 1
v2 + v3 = v3 or 3v2 = 4v3
8 2
Now 3v2 = 4v3 = 4(1 − v1 − v2 )
7v2 = 4 − 4v1 or 56v1 + 4v1 = 4
4 8 6
∴ v1 = , v2 = , v3 =
60 15 15
IET
2.17 Question Bank
IET
X / Y -2 -1 4 6
1 0.1 0.2 0 0.3 Determine (i) Marginal Distributions of X and Y
2 0.2 0.1 0.1 0
(ii) Covariance of X and Y (iii) Correlation of X and Y [VTU Jan 2020,June
2019, Dec 2010]
7) A fair coin is tossed thrice. The random variables X and Y are defined as
follows: X = 0 1 according as head or tail occurs on the first toss, y= no. of
heads.
(i) Determine the marginal probability distribution of X and Y (ii)Determine the
joint distribution of X and Y (iii) Determine E(X), E(Y) and E(XY) (iv)
Determine σx , σy [VTU]
10) A fair coin is tossed 4 times. Let X denote the no. of heads occuring and let Y
denote the longest string of heads occurring. Find the joint distribution of X and
Y. [VTU June 2011]
12) X and Y are independent random variables. X takes the values 1, 2 with
probability 0.7, 0.3 each and y takes the values −2, 5, 8 with probabilities
0.3, 0.5, 0.2. Find the joint distribution of X and Y . Hence find
COV (X, Y ). [VTU Jan 2018]
1 1 1
, ,
3 3 3
IET
13) X and Y are independent random variables. X take values, 2, 5, 7 with
probability 12 , 14 and 14 , respectively. Y takes values 3, 4, 5 with the probability
14) Determine (i) marginal distribution (ii) covariance between the discrete random
AJ
variables X and Y, of the jont probsbility distribution
X\Y 3 4 5
2 1/6 1/6 1/6
(VTU Model 2020) Ans : (i) p(x) = 12 , 14 , 14 ,
5 1/12 1/12 1/12
7 1/12 1/12 1/12
p(y) = 31 , 13 , 1
3
(ii) COV (X, Y ) = 0
15) The random variable X takes values 0,1,2 with probability 0.3,0.3,0.4 and the
random variable Y takes values 1,2,3 with probability 0.2, 0.2, 0.6. If X and
Y are independent random variables, (a) find the joint probability distribution of
X and Y and (b) verify that Cov(X, Y ) = 0
X/Y 1 2 3
0 0.06 0.06 0.18
Ans : 1 0.06 0.06 0.18
µX = 1.1, µY = 2.4, E[XY ] = 2.64, Cov(X, Y ) = 0
0 1 0
16) Find the unique fixed probability vector for the regular stochastic matrix, A = 1 1 1
6 2 3
2 1
0 3 3
1 1
0 2 2
0 1 0
1 2 1 1
17) Find the unique fixed probability vector of (a) A = 3 3
0 ,(b) B = 2
0 2
1 1 1
0 1 0 2 4 4
Ans: (a) 2 , 6, 1
9 9 9
5 6 4
(b) 15 , 15 , 15
18) Find the unique fixed probability vector for the regular stochastic matrix,
3 1
0 4 4
1 1 1 1 1
A= 2
2
0 Ans: v = , ,
4 2 2
0 1 0
1 1 1
2 4 4
19) Define Stochastic Matrix. Find the unique fixed probability vector for the regular stochastic matrix, A = 1 0 1
2 2
0 1 0
2 1
0 3 3
20) Prove that the Markov Chain whose t.p.m. P = 1 0 1
is irreducible. Find the corresponding stationary probability vector.
2 2
1 1
2 2
0
0 1 0
21) Show that P = 0 0 1 is a regular stochastic matrix and find the corresponding unique fixed probability vector. Ans: In P 5 all the
1 1
0
2 2
IET
entries are positive. Hence P is a regular stochastic matrix.
v = (1/5, 2/5, 2/5)
22) Three boys A, B, C are throwing ball to each other. A always throws the ball to B and B always throws the ball to C. C is just likely to throw the
ball to B as to A. If C was the first person to throw the ball find probabilities that after three throws (i) A has the ball (ii) B has the ball (iii) C has
the ball Ans: Thus after three throws the probability that the ball is with A is 1/4, with B is 1/4 and with C is 1/2.
23) A gambler’s luck follows a pattern such that if he wins a game, the probability of winning the next game is 0.6, and if he loses a game, the
probability of losing the next game is 0.7. There is an even chance that the gambler wins the first game. What is the probability that he wins (i)
the second game, (ii) the third game (iii) In the long run, how aften he will win? Ans : 9/20 and 87/200, v = [3/7, 4/7]
24) A student’s study habits are as follows: If he studies one night, he is 70% sure not to study the next night. On the other hand, if he does not study
one night, he is 60% sure not to study the next night as well. In the long run, how often does he study? Ans: v=(4/11 , 7/11) )
25) A software engineer goes to his work place every day by motor bike or by car. He never goes by bike on two consecutive days, but it he goes by
car on a day then he is equally likely to go by car or by bike the next day. Find the t.p.m. of the Markov chain. If car is used on the first day of the
5 7
AJ
week, find the probability that (i) bike is used (ii) cars is used, on the fifth day. Ans: 16 , 16
26) A company executive changes his car every year. If he has a car of make A, he changes over to make B. From make B, he changes over to make
C. However, if he has a car C, he is just as likely to change it for car of make A as to change it for a car of make B. If he had a car of make C in
the year 2008,
(i) Find the probability that he had a car of (a) make A in 2010 (b) make C in 2010 (c) make C in 2011 (d) make B in 2011
(ii) In the long run, how often will he have a car of make C ?
(2) 4 (2)
Ans: (i) (a) Probability that he had 2002 Ford is p3 =(b) Probability that he had 2002 Tata is p1 = 1
9
. 9
. (c) Probability that he had 2003
(3) 7 (3) 16 1 2 3
Maruti is p2 = 27 . (d) Probability that he had 2003 Ford is p3 = 27 . (ii) v = 6 , 6 , 6 . In the long run, the proportion of time he will
have a Ford is 3
6
= 21
; i.e., 50% of the time.
27) Every year, a man trades his car for a new car. If he has a Maruti, he trades it for an. Ambassador. If he has an Ambassador, he trades it for a
Santro. However, if he has a Santro, he is just as likely to trade if for a new Santro as to trade if for Maruti or an Ambassador. In 2000 he bought
his first car which was Santro. Find the probability that he has (i) 2002 Santro (ii) 2002 Maruti
28) A man’s smoking habits are as follows. If he smokes filter cigarettes one week, he switches to nonfilter cigarettes the next week with probability
0.2. On the other hand, if he smokes nonfilter cigarettes one week, there is a probability 0.7 that he switches to filter in the following week. In the
long run how often does he smoke filter cigarettes? [VTU Jan 2018]
Ans: In the long run, he will smoke filter cigarettes 3/5 or 60% of the time.
29) A saleman’s territory consists of 3 cities A, B and C. He never sells in the same city on successive days. If he sells in city A, then the next day
he sells in city B. However if he sells in either B or C, then the next day he is twice as likely to sell in city A as in other city. In the long run,
2 9 3
how often does he sell in each of the cities. Ans: v = 5 , 20 , 20 . In the long run he sells 40% of time in city A, 45% in B, 15% of time
in C.
30) There are 2 white marbles in box A and 3 red marbles in box B. At each step of the process a marble is selected from each box and the two
marbles selected are interchanged. Let the state ai of the system be the number i of red marbles in box A. (a) Find the transition matrix P . (b)
What is the probability that there are 2 red marbles in box A after 3 steps (c) In the long run, what is the probability that there are 2 red marbles
5
in box A. Ans : (b) Probability that there are 2 red marbles in box A after 3 steps is 18 . c) Fixed probability vector: v = (0.1, 0.6, 0.3). In
the long run, 30% of the time, there will be 2 red marbles in box A.
31) A habitual gambler is a member of two clubs A and B. He visits either of the clubs everyday for playing cards. He never visits club A on two
consecutive days. But, if he visits club B on a particular day, then the next day he is as likely to visit club B or club A. Find the transition matrix
of this Markov chain. Also, (a) show that the matrix is a regular stochastic matrix and find the unique fixed probability vector. (b) if the person
had visited club B on Monday, find the probability that he visits club A on Thursday. Ans: v = ( 1 , 2 ), 8
3 3
3
32) Explain
IET
(i) Regular and irregular Markov Chain
(ii) State Distribution and Higher Transition Probabilities
AJ
3.1
IET
Statistical Inference 1
Sampling
• For example, consider the problem of buying 1 kilogram of rice, when we visit
the shop, we do not check each and every rice grains stored in a gunny bag;
AJ
rather we put our hand inside the bag and collect a sample of rice grains. Then
analysis takes place. Based on this, we decide to buy or not. Thus, the problem
involves studying whole rice stored in a bag using only a sample of rice grains.
• The selection of individual or item from the population in such a way that each
has the same chance of being selected is called as random sampling.
120
Lecture Notes - BCS301 : Mathematics for Computer Science - Module 1 : Probability Distributions Page 121
• The statistical constant of the population (such as mean and standard devia-
tion etc.) is referred as Parameter and the statistical constant of the Sample is
referred as Statistic.
• For every sample of size n, we can compute quantities like mean, median, stan-
dard deviation etc., obviously these will not be the same. Suppose we group
these characteristics according to their frequencies, the frequency distributions
3.2
IET so generated are called Sampling Distributions. The sampling distribution of
large samples is assumed to be a normal distribution.
Testing of Hypothesis:
3) Specify the level of significance: The level of significance is defined as the proba-
bility of rejecting Null hypothesis when it is true. The higher the level of significance,
the higher the probability of rejecting a Null hypothesis when it is true.
Commonly used levels of significance in practice are 5%(= 0.05) and 1%(=
0.01).
4) Test of Significance : It enable us to decide on the basis of the sample results if
the deviation between the observed sample statistic and the hypothetical parameter
value is significant. Use a test statistic to Calculate the value of the statistic from the
given data.
5) Accept or Reject the Null Hypothesis. If the calculated value of the statistic is less
than tabulated value, we accept H0 . Otherwise we reject H0 and accept H1 .
IET
• The procedure involves the following:
1) Null Hypothesis : First we set up a definite statement about the population
parameter which we call it as null hypothesis, denoted by H0 . Null Hypothesis
is the statement which is tested for possible rejection under the assumption that
it is true.
2) Next we set up another hypothesis called alternate hypothesis which is just
complimentary to null hypothesis; denoted by H1 . (i.e. The Null and Alterna-
tive hypothesis are complementary and the two together exhaust all possibilities
regarding the value that the hypothesised parameters can assume).
3) Specify the level of significance: The level of significance is defined as the
probability of rejecting Null hypothesis when it is true. The higher the level of
significance, the higher the probability of rejecting a Null hypothesis when it is
true.
Commonly used levels of significance in practice are 5%(= 0.05) and 1%(=
AJ
0.01).
4) Test of Significance : It enable us to decide on the basis of the sample re-
sults if the deviation between the observed sample statistic and the hypothetical
parameter value is significant. Use a test statistic to Calculate the value of the
statistic from the given data.
5) Accept or Reject the Null Hypothesis. If the calculated value of the statistic
is less than tabulated value, we accept H0 . Otherwise we reject H0 and accept
H1 .
• Null Hypothesis (H0 ): The new drug has no effect on reducing blood pressure.
The average blood pressure for patients taking the new drug is the same as the
average blood pressure for the general population.
3.4
IET
In this example:
- The null hypothesis (H0 ) represents the default assumption, suggesting no effect
or no difference. - The alternative hypothesis (H1 or Ha ) represents the claim or the
effect that the researcher is trying to provide evidence for.
The Critical Value serves as the basis for either Accepting or Rejecting a Hypothesis.
When α = 0.05, the Region of Rejection is 0.05 and the Region of Acceptance is
0.95
The value of Z corresponding of 5% level of significance is ±1.96 and correspond-
IET
ing to 1% level of significance value of Z is ±2.58. The set of Z-scores outside
the range ±1.96 and ±2.58 constitutes the critical region of the hypothesis or the
region of rejection or the region of significance at 5% and 1% level of significance
respectively.
−1.96 and z = +1.96. In other words we can say with 95% confidence that z
liesbetween -1.96 and +1.96 . Further 5% level of significance is denoted by z0.05 .
Thus we can write the verbal statement in the mathematical form,
x̄ − µ
− 1.96 ≤ √ ≤ 1.96
σ/ n
−σ σ
i.e., √ (1.96) ≤ x̄ − µ ≤ √ (1.96)
n n
IET
σ σ
⇒µ ≤ x̄ + √ (1.96) and x̄ − √ (1.96) ≤ µ
n n
Thus we can write by combining
thetwo results in the form,
σ σ
x̄ − 1.96 √ ≤ µ ≤ x̄ + 1.96 √
n n
Confidence limits are the numbers at the upper and lower end of a confidence inter-
val.
z-test is a statistical test that is used to determine whether the mean of a sample is
significantly different from a known population mean when the population standard
deviation is known. It is particularly useful when the sample size is large (¿30).
Suppose that we have a normal population with mean µ and S.D. as σ. If x̄ is the
sample mean of a random sample of size n(where n > 30).
AJ
Some commonly used notations in sampling distributions are given below:
Population Sample
Size N n
Mean µ x̄
2
Variance σ s2
Standard Deviation σ s
Proportion P p
Suppose we take various samples each of size n from a population. If p and q are the
probabilities of success and failure of each member of the sample, then the binomial
distribution given by (p + q)n provides the sampling distribution of the number of
successes in the sample with mean µ = np and standard deviation σ = npq.
IET
successes are obtained by dividing each statistic by n.
np
• Mean (expected value) of proportion of successes = n
=p
√ q
npq
• Standard deviation of proportion = n = pq n
q at 99% con-
• Probable occurrence range(confidence interval) of the proportion
fidence level i.e. 1% significance level is given by: p ± 2.58 pq
n
q at 95% con-
• Probable occurrence range(confidence interval) of the proportion
fidence level i.e. 5% significance level is given by: p ± 1.96 pq
n
Suppose that we have a normal population with mean µ and S.D. as σ. When con-
ducting a test of significance for a single mean, the aim is to determine whether the
AJ
mean of a sample, x̄ is significantly different from a known or hypothesized popula-
tion mean, µ.
• Null Hypothesis (H0 ): Assumes that there is no significant difference, and any
observed difference is due to random chance.
The test statistic is a numerical value calculated from the sample data.
If x̄ is the sample mean of a random sample of size n(where n > 30), then to test
whether the difference between sample mean and population mean is significant, the
test statistic is :
(x̄ − µ)
z=
e
where standard Error, e is given by
2 σ2
e =
n
Critical Values of z at 5% and 1% level of significance are respectively given by
IET
1.96 and 2.58
If |z| < 1.96, we accept the hypothesis that there is no significant difference be-
tween the population mean and sample mean at 5% level of significance.
z=
x−µ
=
x − np
σ e
where the standard Error, e is given by using the formula
e2 = npq
AJ
This can be particularly useful in situations where a binomial distribution with a large
sample size is involved, and normal approximation simplifies the analysis.
Therefore we have the following test of significance:
(i) If |z| < 1.96, difference between the observed and expected number of suc-
cesses is not significant (at 5% level of significance).
(ii) If |z| < 2.58, difference between the observed and expected number of suc-
cesses is not significant(at 1% level of significance).
(ii) If |z| > 1.96, difference is significant (at 5% level of significance).
(iii) If |z| > 2.58, difference is significant (at 1% level of significance).
To test the significant difference between the sample proportion p and the Population
proportion P we use the statistic (z-test) which is calculated as :
p−P
z=
e
where the standard Error, e is given by using the formula
IET
PQ
e2 =
n
and
When comparing the means of two independent samples from two different popu-
lationswith large sizes, the z-test for the difference between means can be used.
To test whether the means are equal, We define the null and alternate hypothesis as
H0 : µ1 = µ2 and H1 : µ1 ̸= µ2
and the test statistic is calculated as :
(x̄1 − x̄2 )
z=
e
with the standard error e calculated as:
2 2
σ 1 σ
e2 = + 2
n1 n2
and
IET
ence between the population means µ1 and µ2 at 5% level of significance.
If |z| < z0.01 = 2.58, we accept the hypothesis that there is no significant differ-
ence between the population means µ1 and µ2 at 1% level of significance.
When comparing the means of two independent samples from same population
with S.D. σ, the test statistic and the procedure are similar to the case of the different
population. Here, σ1 = σ2 = σ. Hence to test whether the difference between the
sample means x̄1 and x̄2 is significant or is merely due to fluctuations of sampling,
the test statistic is calculated as :
x̄1 − x̄2
z=
e
with the standard error e calculated as:
2 2
σ σ
e2 = +
n1 n2
or s
1 1
e=σ +
n1 n2
The decision is made by comparing the z-statistic to critical values.
AJ
3.12 Test of significance of Difference between two sample pro-
portions
Given two large samples of sizes n1 , n2 are taken from two similar populations
giving sample proportions as p1 , p2 respectively. To test whether the proportions P1
and P2 of the two populations are equal, We define the null and alternate hypothesis
as
H0 : P1 = P2 and H1 : P1 ̸= P2
and the then we use the test statistic
p1 − p2
z=
e
IET
dard error e is given by using
2 p1 q 1 p2 q 2
e = +
n1 n2
where
• Here, the null hypothesis (H0 ) and alternative hypothesis (H1 or Ha ) are stated
as follows:
H0 : p1 = p2
H1 : p1 ̸= p2
where p1 and p2 are the sample proportions.
Problem 66. A sample of 100 tyres is taken from a lot. The mean life of a tyre is
found to be 39350 kms with a SD of 3260. Can it be considered as the true random
sample from population with mean life of 40000 kms? ( use 5% significance level).
[VTU: DEC/JAN 16]
AJ
Solution: First we shall set up null hypothesis, H0 : µ = 40, 000,
Alternate hypothesis as H1 : µ ̸= 40, 000.
We consider that the problem follows a two tailed test and chose α = 5%. Then
corresponding to this, tabulated value is 1.96.
Consider the expression for finding test criterion,
z = x−µ√σ (*) Here, µ = 40, 000, X = 39, 350 and
n
σ = 3, 260, n = 100.
3,260
S.E. = √σn = √ 100
= 326.
Thus, from (8), z = 1.994.
As this value is slightly greater than 1.96, we reject the null hypothesis and conclude
that sample has not come from a population of 40,000 kilometers.
Problem 67. A die is tossed 960 times and 5 appear 184 times, is the die biased?
[VTU:JUNE/JULY-15, 2006]
√
r
√ 1 5
npq = 960 × × = 133.33 = 11.55
IET
6 6
Hence z = x−np
√
npq
= 24
11.55
= 2.078 > 1.96
We reject the hypothesis at 5% level of significance and we conclude that the die is
biased.
Problem 68. In 324 throws of a six faced ‘die’ an odd number turned up 181 times.
Is it reasonable to think that the die is an unbiased one at 1% level of significance.?
[VTU July 2017]
Problem 70. A manufacturing company claims that at least 95% of its products sup-
plied confirm to the specifications out of a sample of 200 products, 18 are defective.
Test the claim at 5% Los.
IET
∴ The test statistic is
p−P
Z= √ PQ
,
n
where P + Q = 1 ⇒ Q = 1 − P
Given that, 18 products are defective out of 200 sample products
∴ The total defective less products = 200 − 18 = 182
182
∴p= = 0.91
200
0.91 − 0.95
∴z= q
0.95×0.05
200
0.04
⇒ Z = −√ = −2.5955
0.0002375
At 5% level, the tabulated value of Zα is 1.96 Since |Z| = 2.5955 > 1.96
Hence, the null hypothesis is rejected at 5% level of significance
Problem 71. A stenographer claims that she can type at the rate of 120 words per
minute. Can we reject her claim on the basis of 100 trails in which she demonstrates
AJ
a mean of 116 words with a standard deviation of 15 words? Use 5% level of signif-
icance. (VTU Model
2020)
Problem 72. In a large city A, 20% of a random sample of 900 school boys had a
slight physical defect. In another large city B, 18.5% of a random sample of 1600
school boys had the same defect. Is the difference between the proportions significant
at 5% significance level? (VTU Model 2023)
IET
z= r
1 1
P Q n1 + n2
where P + Q = 1 ⇒ Q = 1 − P
20
Given n1 = 900, n2 = 1600, p1 = 100 = 0.2, p2 = 18.5
100
= 0.185
n1 p1 + n2 p2 (900)0.2 + (1600)0.185
P = = = 0.1904
n1 + n2 900 + 1600
⇒ Q = 1 − P = 1 − 0.1904 = 0.8096
p1 − p2 0.2 − 0.185
⇒Z= r = q = 0.9305
1 1
1
P Q n1 + n21 (0.1904)(0.8096) 900 + 1600
this sample is from a population of mean 165 cm and S.D 10 cm? (VTU Model
2020)
IET
The test statistic is given by
x̄−µ
√ = 160−165
z = σ/ n √10
= −5
100
∴ z = −5 [ Calculated value ]
At 5% significance level the tabulated value for zα is 1.96
But |z| > 1.96.
so we reject H0
Conclusion:
There is a significant difference between the sample mean and population means.
Problem 75. The mean life time of a sample of 100 fluorescent tube lights manufac-
tured by a company is found to be 1570 hrs with a standard deviation of 120 hrs.
Test the hypothesis that the mean life-time of the lights produced by the company is
1600 hrs at 0.01 level of significance. (VTU Model 2020)
Solution: Let H0 :the mean life-time of the lights produced by the company is 1600
Given x̄ = 1570, n = 100, s = 120, µ = 1600
AJ
Apply the formula to calculate z score:
x̄ − µ 1570 − 1600
z= √ = √ = −2.5
σ/ n 120/ 100
|z| = 2.5 < z0.01 = 2.58 Therefore, we accept the hypothesis H0 at 0.01 level
of significance.
Problem 76. It is claimed that a random sample of 49 tyres has a mean life of
15, 200kms. Is the sample drawn from a population whose mean is 15, 150kms
and whose standard deviation is 1,200 kms? Test the significance at 0.05 level.(VTU
Model 2020)
IET
Problem 77. In a sample of 600 men from a certain city, 450 are found smokers. In
another sample of 900 men from another city, 450 are smokers. Do the data indicate
that the cities are significantly different with respect to the habit of smoking among
men. Test at 5% significance level. (VTU Model 2023)
X1 450
Solution : For sample 1, we have that the sample proportion is p̂1 = N1
= 600
=
0.75.
X2 450
For sample 1, we have that the sample proportion is p̂2 = N2
= 900
= 0.5.
The value of the pooled proportion is computed as
X1 + X2 400 + 450
= = 0.56
N1 + N2 600 + 900
The following null and alternative hypotheses for the population proportion needs to
be tested:
H0 : p1 = p2
H1 : p1 ̸= p2
AJ
This corresponds to a two-tailed test, and a z-test for two population proportions will
be used. Based on the information provided, the significance level is α = 0.05, and
the critical value for a two-tailed test is zc = 1.96.
The rejection region for this two-tailed test is R = {z : |z| > 1.96}
The rejection region for this two-tailed test is R = {z : |z| > 1.96} The z-statistic
is computed as follows:
p̂1 − p̂2
z= r
1 1
p̄(1 − p̄) N1 + N2
0.75 − 0.5
= q
1 1
≈ 9.69
0.56(1 − 0.56) 600 + 900
Since it is observed that |z| = 6.38 > 1.96 = zc , it is then concluded that the null
hypothesis is rejected.
Using the P-value approach: The p-value is p = 2P (Z > 9.682) = 0, and since
p = 0 < 0.05 = α, it is concluded that the null hypothesis is rejected.
Therefore, there is enough evidence to claim that the population proportion p1 is
different than p2 , at the α = 0.05 significance level.
Problem 78. A sample of 900 members is found to have a mean of 3.4 cm. Can it
IET
be reasonably regarded as a truly random sample from a large population with mean
3.25 cm and S.D.1.61 cm.
Solution :
AJ
Problem 80. One type of air craft is found to develop engine trouble in 5 flights out
of a total of 100 and another type in 7 flights out of a total of 200 flights. Is there
a significance difference in the two types of air craft’s so far as engine defects are
concerned? Test at 5% significance level. (VTU Model 2023)
Solution :
5 7
p1 = = 0.05, p2 = = 0.035
100 200
H0 : p1 = p2
H1 : p1 ̸= p2
5+7
P = = 0.04, Q = 1 − P = 0.96
100 + 200
IET
p1 − p2
z=
1 1
P Q n1 + n2
= −3.953(after simplification)
|z| > 1.96
Hence H0 is rejected.
Problem 81. The means of simple samples of sizes 1000 and 2000 are 67.5 and
68.0 cm respectively. Can the samples be regarded as drawn from the same popu-
lation of S.D. 2.5 cm.
Solution : We have
x̄1 = 67.5, x̄2 = 68.0
n1 = 1000, n2 = 2000.
On the hypothesis, that the samples are drawn from the same population of S.D.
σ = 2.5, we get
x̄1 ∼ x̄2 67.5 ∼ 68.0
z = r =
AJ
q
1 1
σ 1
+ n21 2.5 1000
+ 2000
n1
0.5 0.5
= = 5.1 =
2.5 × 0.0387 0.09675
Hence the difference between the sample means i.e., 5.1 is very much greater than
1.96 and is therefore significant. Thus, the samples cannot be regarded as drawn
from the same population.
Problem 82. A sample of height of 6400 soldiers has a mean of 67.85 inches and a
standard deviation of 2.56 inches while a simple sample of heights of 1600 sailors
has a mean of 68.55 inches and a standard deviation of 2.52 inches. Do the data
indicate that the sailors are on the average taller than soldiers?
Solution : Here
x̄1 = 67.85, σ1 = 2.56, n1 = 6400
x̄2 = 68.55, σ2 = 2.52, n2 = 1600.
∴ S.E. of the difference
sof the mean heightssis
σ1 2 σ22 (2.56)2 (2.52)2
e= + = +
n1 n2 6400 1600
p
= [.001024 + .003969] = 0.005 nearly.
IET
Also difference between the means = x̄2 − x̄1 = 0.7, which > 10e. This is highly
significant. Hence the data indicates that the sailors are on the average taller than the
soldiers.
95% confidence limits for the mean of the population corresponding to a given sam-
ple is
√
x̄ ± 1.96(σ/ n)
and 99% confidence limits for the mean is
√
x ± 2.58(σ/ n)
. 95% confidence limits for the proportion of the population corresponding to a
given sample is r
pq
p ± 1.96
n
AJ
99% confidence limits for the proportion of the population corresponding to a given
sample is r
pq
p ± 2.58
n
Problem 83. A sample of 900 days was taken in a coastal town and it was found that
on 100 days the weather was very hot. Obtain the probable limits of the percentage
of very hot weather.
Problem 84. In a sample of 500men it was found that 60% of them had over
weight. What can we infer about the proportion of people having over weight in the
population?
60
Solution : Probability of persons having over weight = p = 100
= 0.6 &.q =
1 − p = 0.4
p
Probable limits = p ± 2.58 q pq/n
IET
Probable limits = 0.6 ± 2.58 (0.6)(0.4)
500
Probable limits = 0.6 ± 0.0565 = 0.5435 and 0.6565
Thus the probable limits of people having over weight is 54.35% to 65.65%
Problem 85. To know the mean weights of all 10 year old boys in Delhi a sample of
225 was taken. The mean weight of the sample was found to be 67 pounds with S.D
of 12 pounds. What can we infer about the mean weight of the population?
Solution: Sample mean (x̄) = 67, Sample size n = 225, S.D (σ) = 12
95% confidence limits for the mean of the population corresponding to a given sam-
√
ple is x̄ ± 1.96(σ/ n)
√
and 99% confidence limits for the mean is x ± 2.58(σ/ n).
√
We have ·σ/ n = 12/15 = 0.8
95% confidence limits : 67 ± 1.96(0.8) = 65.432 and 68.568
99% confidence limits : 67 ± 2.58(0.8) = 64.936 and 69.064
We can say with 95% confidence that the mean weight of the population lies be-
tween 65.4 pounds and 68.6 pounds. Also with 99% confidence we can say that the
AJ
mean weight lies between 64.9 pounds to 69.1 pounds.
Problem 86. A sample of 100days is taken from meteorological records of a certain
district and 10 of them are found to he foggy. What are the probable limits of the
percentage of foggy days in the district.
Problem 87. A survey was conducted in a slum locality of 2000 families by selecting
a sample of size 800. It was revealed that 180 families were illiterates. Find the
probable limits of the illiterate families in the population of 2000.
IET
= 0.225 ± (2.58) ↑ = 0.225 ± 0.038
i.e., 800
= 0.187 and 0.263 .
∴ the probable limits of illiterate families in the population of 2000 is ( 0.187)2000
and ( 0.263 ) 2000 .
Thus 374 to 526 are probably illiterate families.
Problem 88. A random sample of 500 apples was taken from a large consignment
and 65 were found to be bad. Estimate the proportion of bad apples in the consign-
ment as well as the standard error of the estimate. Also find the percentage of bad
apples in the consignment.
Solution : Proportion of families having monthly income of Rs. 2500 or less is given
by
p = 206/840 = 0.245. Hence q = 1 − p = 0.755
p p
S.E proportion = pq/n = (0.245 × 0.755)/840 = 0.015
Probable limits of families having monthly income of Rs. 2500 or less are p ±
p
2.58 pq/n. That is,
= 0.245 ± (2.58)(0.015)
= 0.245 ± 0.0387
= 0.2063 and 0.2837 or 20.63% and 28.37%
IET
Hence the probable limits in respect of 18,000 families is given by
0.2063 × 18, 000 and 0.2837 × 18, 000
That is 3713.4 and 5106.6 or 3713 and 5107
Thus we say that 3713 to 5107 families are likely to have monthly income of Rs.
2500 or less.
Problem 90. The mean and S.D of the maximum loads supported by 60 cables are
11.09 tonnes and 0.73 tonnes respectively. Find (a) 95% (b) 99% confidence limits
for mean of the maximum loads of all cables produced by the company.
Solution: By data x̄ = 11.09, σ = 0.73 (a) 95% confidence limits for the mean
of maximum loads are given by √
x̄ ± 1.96(σ/ n)
√
= 11.09 ± 1.96(0.73/ 60)
= 11.09 ± 0.18
Thus 10.91 tonnes to 11.27 tonnes are the 95% confidence limits for the mean of
AJ
maximum loads. (b) 99% confidence limits for the mean of maximum loads are
given by √
x̄ ± 2.58(σ/ n)
√
= 11.09 ± 2.58(0.73/ 60)
= 11.09 ± 0.24 = 10.85 and 11.33
Thus 10.85 tonnes to 11.33 tonnes are the 99% confidence limits for the mean of
maximum loads.
Problem 91. A coin was tossed 400 times and the head turned up 216 times. Test the
hypothesis that coin is unbiased at 5% level of significance. (VTU Model 2023 July
2013, 2007)
IET
z=
x − np
10
= 1.6 =
216 − 200
If we choose the significance level α = 5%, then the tabulated value is 1.96. since,
the calculated value is less than the tabulated value; we accept the null hypothesis
that coin is un - biased.
Problem 92. A die was thrown 9000 times and a throw of 5 or 6 was obtained 3240
times. On the assumption of random throwing, do the data indicate an unbiased die?
[VTU Model 2023 , Model 2020, Jan 2020, Jan 2018, 2010]
Problem 93. In a locality containing 18000 families, a sample of 840 families was
selected at random. Of these 840 families, 206 families were found to have a monthly
income of |250 or less. It is desired to estimate how many out of 18,000 families have
a monthly income of |250 or less. Within what limits would you place your estimate?
Solution: Here
206 103 317
p= = and q =
840 420 420
∴ standard error of the population of families having a monthly income of |250
or less
IET
s s
pq 103 317 1
= = × × = .015 = 1.5%
n 420 420 840
Hence taking 103
420
(or 24.5% ) to be the estimate of families having a monthly income
of | 250 or less in the locality, the limits are (24.5 ± 3 × 1.5)% i.e., 20% and 29%
approximately.
Question Bank
1. A die is tossed 960 times and it falls with 5 upwards 184 times. Is the die
biased?
2. A coin is tossed 1000 times and head turns up 540 times. Test the hypothesis
that the coin is an unbiased one.
3. It is claimed that a random sample of 49 tyres has a mean life of 15, 200kms.
Is the sample drawn from a population whose mean is 15, 150kms and whose
AJ
standard deviation is 1,200 kms? Test the significance at 0.05 level. (VTU
Model 2020)
4. A sample of 900 members is found to have a mean of 3.4 cm. Can it be rea-
sonably regarded as a truly random sample from a large population with mean
3.25 cm and S.D. 1.61 cm.
6. One type of air craft is found to develop engine trouble in 5 flights out of a
total of 100 and another type in 7 flights out of a total of 200 flights. Is there a
significance difference in the two types of air craft’s so far as engine defects are
concerned? Test at 5% significance level. (VTU Model 2023)
7. In a sample of 600 men from a certain city, 450 are found smokers. In another
sample of 900 men from another city, 450 are smokers. Do the data indicate
that the cities are significantly different with respect to the habit of smoking
among men. Test at 5% significance level.
IET
z=6.38
8. In a large city A, 20% of a random sample of 900 school boys had a slight
physical defect. In another large city B, 18.5% of a random sample of 1600
school boys had the same defect. Is the difference between the proportions
significant at 5% significance level? (VTU Model 2023)
9. A sample of 100 tyres is taken from a lot. The mean life of a tyre is found to be
39350 kms with a SD of 3260. Can it be considered as the true random sample
from population with mean life of 40000 kms? ( use 5% significance level).
10. In two large populations there are 30% and 25% respectively of fair haired
people. Is this difference likely to be hidden in samples of 1200 and 900 re-
spectively from the two populations?
11. A stenographer claims that she can type at the rate of 120 words per minute.
Can we reject her claim on the basis of 100 trails in which she demonstrates
a mean of 116 words with a standard deviation of 15 words? Use 5% level of
AJ
significance.
12. 12 dice are thrown 3086 times and a throw of 2, 3, 4 is reckoned as a success.
Suppose that 19142 throws of 2, 3, 4 have been made out. Do you think that
this observed value deviates from the expected value? If so, can the deviation
from the expected value be due to fluctuations of simple sampling?
13. A sample of 100 students is taken from a large population. The mean height
of the students in this sample is 160cm. Can it be reasonably regarded that this
sample is from a population of mean 165 cm and S.D 10 cm?
14. A die is tossed 960 times and 5 appear 184 times, is the die biased?
15. In 324 throws of a six faced ‘die’ an odd number turned up 181 times. Is it
reasonable to think that the die is an unbiased one at 1% level of significance.?
Ans: z=2.11, accept
16. In a sample of 600 men from a certain city, 450 are found smokers. In another
sample of 900 men from another city, 450 are smokers. Do the data indicate
that the cities are significantly different with respect to the habit of smoking
among men. Test at 5% significance level.
17. The means of simple samples of sizes 1000 and 2000 are 67.5 and 68.0 cm
respectively. Can the samples be regarded as drawn from the same population
IET
of S.D. 2.5 cm.
Ans : z=5.1
18. A sample of height of 6400 soldiers has a mean of 67.85 inches and a standard
deviation of 2.56 inches while a simple sample of heights of 1600 sailors has
a mean of 68.55 inches and a standard deviation of 2.52 inches. Do the data
indicate that the sailors are on the average taller than soldiers?
Ans : reject
19. In a group of 50 first cousines there were found to be 27 males and 23 females.
Ascertain if the observed proportions are inconsistent with the hypothesis that
the sexes should be in equal proportion.
20. A random sample of 500 apples was taken from a large consignment and 65
were found to be bad. Estimate the proportion of the bad apples in the consign-
ment as well as the standard error of the estimate. Deduce that the percentage
of bad apples in the consignment almost certainly lies between 8.5 and 17.5.
AJ
21. The mean life time of a sample of 100 fluorescent tube lights manufactured by
a company is found to be 1570 hrs with a standard deviation of 120 hrs. Test
the hypothesis that the mean life-time of the lights produced by the company is
1600 hrs at 0.01 level of significance.
23. n a sample of 500 people from a state 280 take tea and rest take coffee. Can
we assume that tea and coffee are equally popular in the state at 5% level of
significance? Ans: 2.68, reject
24. A sample of 400 items is taken from a normal population whose mean is 4 and
variance 4 . If the sample mean is 4.45 , can the samples be regarded as a simple
sample?
25. To know the mean weights of all 10-year old boys in Delhi, a sample of 225 is
taken. The mean weight of the sample is found to be 67 pounds with a S.D. of
12 pounds. Can you draw any inference from it about the mean weight of the
IET
population?
26. A normal population has a mean 0.1 and a S.D. of 2.1. Find the probability that
the mean of simple sample of 900 members will be negative.
27. If the mean breaking strength of copper wire is 575lbs. with a standard de-
viation of 8.3lbs, how large a sample must be used in order that there be one
chance in 100 that the mean breaking strength of the sample is less than 572
√ 3 √
lbs.? [Hint. |z| = x̄−µ
σ
n = 8.3
n Also from table IV, z = 2.33. Hence
n = 42 nearly.]
32. A random sample of 1000 men from North India shows that their mean wage is
|5 per day with a S.D. of |1.50. A sample of 1500 men from South India gives
a mean wage of |4.50 per day with a standard deviation of |2. Does the mean
rate of wages varies as between the two regions?
34. A survey was conducted in a slum locality of 2000 families by selecting a sam-
ple of size 800. It was revealed that 180 families were illiterates. Find the
IET
probable limits of the illiterate families in the population of 2000.
35. To know the mean weights of all 10 year old boys in Delhi a sample of 225 was
taken. The mean weight of the sample was found to be 67 pounds with S.D of
12 pounds. What can we infer about the mean weight of the population?
36. A sample of 900 days was taken in a coastal town and it was found that on 100
days the weather was very hot. Obtain the probable limits of the percentage of
very hot weather.
37. In a sample of 500men it was found that 60% of them had over weight. What
can we infer about the proportion of people having over weight in the popula-
tion?
38. In a locality of 18000 families a sample of 840 families was selected at random.
Of these 840 families, 206 families were found to have monthly income of Rs.
2500 or less. It was desired to estimate how many of the 18,000 families have
monthly income of Rs. 2500 or less. Within what limits would you place your
AJ
estimate.
39. The mean and S.D of the maximum loads supported by 60 cables are 11.09
tonnes and 0.73 tonnes respectively. Find (a) 95% (b) 99% confidence limits
for mean of the maximum loads of all cables produced by the company.
40. 400 children are chosen in an industrial town and 150 are found to be under
weight. Assuming the conditions of simple sampling, estimate the percentage
of children who are under weight in the industrial town and assign limits within
which the percentage probably lies?
IET
Statistical Inference II
Syllabus :
Sampling variables, central limit theorem and confidences limit for unknown mean,
Test of Significance for means of two small samples, students ‘t’distribution, Chi-
square distribution as a test of goodness of fit. F-Distribution.
If X̄ is the mean of a random sample of size n taken from a population with mean
µ and finite variance σ 2 , then the limiting form of the distribution of
X̄ − µ
AJ
Z= √ ,
σ/ n
as n → ∞, is the standard normal distribution (with mean 0 and S.D. 1).
In a population whose distribution may be known or unknown, if the size (n) of sam-
ples is sufficiently large, the distribution of the sample means will be approximately
normal.
Problem 94. An electrical firm manufactures light bulbs that have a length of life that
is approximately normally distributed, with mean equal to 800 hours and a standard
deviation of 40 hours. Find the probability that a random sample of 16 bulbs will
have an average life of less than 775 hours.
148
Lecture Notes - BCS301 : Mathematics for Computer Science - Module 1 : Probability Distributions Page 149
IET
P (X̄ < 775) = P
X̄ − µ
= P (Z < −2.5)
sigma
√
= Area(−∞to − 2.5)
= Area(2.5to∞)
= A(∞) − A(2.5)
= 0.0062
n
15 . A sample size of 80 is drawn randomly from the population. Find the probability
775 − µ
sigma
√
n
!
that the sum of the 80 values (or the total of the 80 values) is more than 7400 .
Solution: Let X = one value from the original unknown population. The probabil-
ity question asks you to find a probability for the sample mean.Let X̄ = the mean
of a sample of size 25. Since µ = 90, σ = 15, and n = 25,
IET
P (85 < x̄ < 92)
!
85 − µ X̄ − µ 92 − µ
=P sigma
< sigma < sigma
√ √ √
n n n
85 − 90 92 − 90
=P √ <z< √
15/ 25 15/ 25
= P (−1.67 < z < 0.67)
= Area under the standard normal curve from − 1.67 to 0.67
= Area under the standard normal curve from − 1.67 to 0
+ Area under the standard normal curve from 0 to 0.67
= Area under the standard normal curve from 0 to 1.67
+ Area under the standard normal curve from 0 to 0.67
= A(1.67) + A(0.67)
= 0.6997
AJ
Problem 97. State Central limit theorem. Use the theorem to evaluate P [50 <
X̄ < 56] where X̄ represents the mean of a random sample of size 100 from an
infinite population with mean µ = 53 and variance σ 2 = 400
Hence σ = 20
50 − µ X̄ − µ 56 − µ
P [50 < X̄ < 56] = P √ < √ < √
σ/ n σ/ n σ/ n
= P (−1.5 < z < 1.5)
= Area under the standard normal curve from − 1.5 to 1.5
= 2A(1.5) (by symmetry)
IET
= 2 × 0.4332
= 0.8664
Problem 98. Certain tubes manufactured by a company have mean life time of 800
hours and S.D of 60 hours. Find the probability that a random sample of 16 tubes
taken from the group will have a mean life time (a) between 790 hours and 810 hours.
(b) less than 785 hours. (c) more than 820 hours. (d) between 770 hours and 830
hours.
√
Solution: By data µ = 800, σ = 60, n = 16 ∴ σx̄ = σ/ n = 60/4 = 15
(a)
790 − µ X̄ − µ 810 − µ
P (790 < x̄ < 810) = P √ < √ < √
σ/ n σ/ n σ/ n
= P (−0.67 < z < 0.67)
= 2P (0 < z < 0.67)
= 2(0.2486)
AJ
= 0.4972
Thus P (790 < x̄ < 810) = 0.4972
(b)
X̄ − µ 785 − µ
P (x̄ < 785) = P √ < √
σ/ n σ/ n
= p(z < −1)
= P (z > 1)
= A(∞) − A(1)
= 0.5 − 0.3413
= 0.1587
IET
770 − µ
√
σ/ n
<
= 0.9544
Thus P (770 < x̄ < 830) = 0.9544
X̄ − µ
√ <
σ/ n
= 0.5 − 0.4082
830 − µ
√
σ/ n
= 2P (0 < z < 2)
= 2(0.4772)
Problem 99. A random sample of size 64 is taken from an infinite population having
mean 112 and variance 144. Using central limit theorem, find the probability of
getting the sample mean X̄ greater than 114.5
AJ
Solution: Given n = 64
µ = 112
σ 2 = 144
X̄ − µ 114.5 − µ
P (X̄ > 114.5) = P √ > √
σ/ n σ/ n
114.5 − 112
=P z>
1.5
= P (z > 1.66)
= A(∞) − A(1.66)
= 0.5 − 0.4515
= 0.0489
Problem 100. In a recent study reported on the Flurry Blog, the mean age of tablet
users is 34 years. Suppose the standard deviation is 15 years. Take a sample of size
n = 100. Using central limit theorem, find the probability that the sample mean
age is more than 30 years.
IET
S.D. of the population ⇒ σ = 15
X̄ − µ 30 − µ
P (X̄ > 30) = P √ > √
σ/ n σ/ n
30 − 34
=P z>
15/10
= P [z > −2.66]
= Area from -2.66 to ∞ (understandardnormalcurve)
= Area from -2.66 to 0 + Area from 0 to ∞
= A(2.66) + 0.5
= 0.9961
IET
t statistic is defined as
x̄ − µ √
t= n
s
where µ is the mean of population,
∞
1X
x̄ = xi
n i=1
is the mean of the sample and
n
2 1 X
s = (x − x̄)2
n−1 i=1
is the variance of the sample.
We compute the test-statistic under H0 and compare it with the tabulated value of t
for (n − 1) d.f. at the given level of significance.
If calculated |t| < tα , then H0 is accepted and we say that difference between
sample mean and population mean is not significant at level α.
Variance P
1
s2 = n−1 (x − x̄)2
⇒ s2 = 91 × 1833.6
⇒ s2 = 203.73333
⇒ s = 14.2735
Given the mean of population µ = 100 We have
√
t = x̄−µ
s
n
IET
√
⇒ t = 97.2−100
14.2735 √
n
−2.8
⇒ t = 14.2735 10 = −0.6203 < t0.05 = 2.262
Hence we accept H0 . i.e. the data supports the assumption of a population mean
IQ= 100
IET
0 6.76
−2 −4.6 21.16
1 −1.6 2.56
5 2.4 5.76
0 −2.6 6.76
4 1.4 1.96
6 3.4 11.56
Σx = 31 Σ(x − x̄)2 = 104.92
(x − x̄)2
P
104.92
s2 = =
= 9.54
n−1 12 − 1
s = 3.08
The t statistic is,
x̄ − µ √ 2.6 − 0 √
2.6
t= n= 12 = × 3.464 = 2.924
s 3.08 3.08
As the computed value of t, i.e., 2.924 is greater than t0.05 = 2.201 for 11 d.f.
Hence we reject H0 and we conclude that as a result of the stimulus blood pressure
AJ
will increase.
Problem 103. Ten individuals are chosen at random from the population and their
heights are found to be inches 63, 63, 64, 65, 66, 69, 69, 70, 70, 71. Discuss the
suggestion that the mean height in the universe is 65 inches, given that for 9 degree
of freedom the value of student’s ’t’ at 0.05 level of significance is 2.262. [VTU:
Dec 2018, DEC/JAN 16, Jan 2014, June 2012]
IET
But t0.05 = 2.262 at 9 d.f.(given)
i.e.. |t| = 2.02 < t0.05 = 2.262
The difference is not significant at a 0.05 level and Ho is accepted and we conclude
that the mean height is 65 inches.
Problem 104. Nine items have values 45, 47, 50, 52, 48, 47, 49, 53, 51. Does the
mean of these differ significantly from assumed mean of 47.5?(ν = 8, t0.05 =
2.31) [VTU:JUNE/JULY-15, July 2013, 2010]
x 45 47 50 52 48 47 49 53 51
x − x̄ -4.11 -2.11 0.89 2.89 -1.11 -2.11 -0.11 3.89 1.89
(x − x̄)2 16.89 4.45 0.79 8.35 1.23 4.45 0.01 15.13 3.57
AJ
x̄ = Σx
n
= 442
9
= 49.11;
Σ(x − x̄)2 = 54.89 ;
2
s2 = Σ(x−x̄)
(n−1)
= 6.86 ∴ s = 2.6192
x̄ − µ 49.11 − 47.5
√ =
t= √ = 1.84
s/ n 2.6192/ 9
But t0.05 = 2.31 for ν = 8
Conclusion : since |t| < t0.05 , the hypothesis is accepted.
i.e. there is no significant difference between their mean.
Problem 105. A machinist is making engine parts with axle diameter of 0.7 inch.
A random sample of 10 parts shows mean diameter of 0.742 inch with a standard
deviation of 0.04 inch. On the basis of this sample, would you say that the work is
inferior? [VTU 2009]
IET
For ν = 9, tabulated value is t0.05 = 2.262.
As the calculated value of |t| > t0.05 , we reject H0 . and conclude that the work is
inferior.
The confidence limits for the mean of the population corresponding to a given large
sample (i.e. n ≥ 30) is
√
x̄ ± zα (σ/ n)
where σ is the population S.D. and n is the sample size.
In particular, 95% confidence limits for the mean of the population corresponding
to a given sample is
√
x̄ ± 1.96(σ/ n)
and 99% confidence limits for the mean is
√
x ± 2.58(σ/ n)
AJ
. The confidence limits for the mean of the population corresponding to a given
small sample is
√
x̄ ± tα (s/ n)
where s is the sample S.D. and n is the sample size and tα is the table value of t for
level α and n − 1 degrees of freedom.
Problem 106. A random sample of size 25 from a normal distribution N µ, σ 2 = 4
yields, sample mean X̄ = 78.3. Obtain a 99% confidence interval for µ.
IET
Hence 99% confidence confidence interval for µ is (77.1812, 79.4188).
Problem 107. The heights of a random sample of 50 college students showed a mean
of 174.5 centimeters and a standard deviation of 6.9 centimeters. Construct a 98%
confidence interval for the mean height of all college students.
Solution:Here n = 50 (large)
and x̄ = 174.5, s = 6.9
Since n is large, we shall use z- distribtion.
For 98% confidence, we have level of significance α = 2% = 0.02,
of z is z0.02 = 2.326(refer standard
For this level, the table value normal table)
σ σ
confidence interval = x − z × √ , x + z × √
n n
6.9 6.9
= 174.5 − 2.326 × √ , 174.5 + 2.326 × √
50 50
= (172.24, 176.76)
AJ
Problem 108. Deduce that for a random sample of 16 values with mean 41.5 inches
and the sum of the squares of the deviations from the mean 135 inches 2 and drawn
from a normal population, 95% confidence limits for the mean of the population are
39.9 and 43.1 inches.
For this level, the table value of t for 15 d.f. is is tα = t0.05 = 2.131
s s
confidence limits = x̄ − tα √ and x̄ + tα √
n n
3 3
= 41.5 − 2.131 × √ and 41.5 + 2.131 × √
16 16
= 41.5 − 1.598 and 41.5 + 1.598
= 39.9 and 43.1
IET
Hence the required confidence limits are 39.9 and 43.1 inches.
Problem 109. Let the observed value of the mean X̄ of a random sample of size 20
from a normal distribution with mean µ and variance σ 2 = 80 be 81.2 . Find a
90% and a 95% confidence intervals for µ.
Problem 110. Suppose scores on exams in statistics are normally distributed with
an unknown population mean and a population standard deviation of 3 points. A
random sample of 36 scores is taken and gives a sample mean of 68 points. Find a
95% confidence interval for the mean exam score.
σ
Lower Limit = x̄ − z × √
n
3
= 68 − 1.9599 . . . × √
36
= 67.02
σ
Upper Limit = x̄ + z × √
n
3
IET
= 68 + 1.9599 . . . × √
36
= 68.98
Hence the 95% confidence interval is (67.02, 68.98)
Problem 111. Below are given the gain in weights of cows fed on two diets A and B
Gain in weight:
Diet A: 25, 32, 30, 34, 24, 14, 32, 24, 30, 31, 35, 25
Diet B: 44, 34, 22, 10, 47, 31, 40, 30, 32, 35, 18, 21, 35, 29, 22
Test if the two diets differ significantly as regards their effect on increase in weight.
IET
n2 = 15
X
x1 = 336
X
x2 = 450
X
(x1 − x̄1 )2 = 380
X
(x2 − x̄2 )2 = 1410
n
1 X 336
x̄1 = x1i = = 28
n1 i=1
12
n
1 X 450
x̄2 = = 30 x2i =
n2 i=1 15
Let the level of significance be 5% with n1 + n2 − 2 degrees of freedom Tabulated
t0.05 for (12 + 15 − 2) = 25 d. f=2.06
AJ
IET
14 −14 196 31 1 1
32 4 16 40 10 100
24 −4 16 30 0 0
30 2 4 32 2 4
31 3 9 35 5 25
35 7 49 18 −12 144
25 −3 9 21 9 81
35 5 25
29 −1 1
22 8 64
(X1 − x̄1 )2 (X2 − x̄2 )2
P P P P
X1 X2
= 336 = 450 = 1410
1 h X 2
X 2
i
S2 = (x1 − x̄1 ) + (x2 − x̄2 )
n1 + n2 − 2
1
= [380 + 1410]
12 + 15 − 2
AJ
1
= [1790] = 71.6
25
x̄1 − x̄2
t= r
2 1 1
S n1 + n2
28 − 30
= q
1 1
71.6 12 + 15
−2 −2 −2
= p = √ = = −0.61
71.6(0.083 + 0.066) 10.66 3.26
|t| = 0.61
d.f. n1 + n2 − 2 = 25 and table value of t for 25 d.f is 2.060 for 5% level of
significance.
Since calculated value is less than table value at 5% level of significance, we Accept
the hypothesis.
Hence, we conclude that the two diets do not differ significantly as regards their
effect on increase in weight.
Problem 112. Samples of two types of electric light bulbs were tested for length of
life and following data were obtained: Sample No.
Type - I Type - II
samplesize n1 = 8 n2 = 7
SampleS.D.
S2 =
=
IET
SampleM eans x̄1 = 1234hrsx̄2 = 1036hrs
s1 = 36hrs
n1 + n2 − 2
8+7−2
1
s2 = 40hrs
Is the difference in the means sufficient to warrant that type I is superior to type II
regarding length of life?
8(36)2 + 7(40)2
1
= [8 × 1296 + 7 × 1600]
13
1
= [10368 + 11200]
AJ
13
1
= [21568]
13
1
= [21568]
13
= 1659.08
x̄1 − x̄2
t= r
2 1 1
S n1 + n2
1234 − 1036
= q
1659.08 81 + 17
198
= √ = 9.37
1659.08 × 0.2679
|t| = 9.37
Since calculated value is greater than table value at 5% level of significance with
n1 + n2 − 2 degrees of freedom, we Reject the hypothesis. Hence, we conclude
that two types of bulbs differ significantly and type I is definitely superior to type II.
Problem 113. A group of boys and girls were given an intelligence test. The mean
score, S.D. score and numbers in each group are as follows.
Boys Girls
Mean 74 70.
SD
n
8
12
10
10
IET
Is the difference between the means of the two groups significant at 5% level of
significance t.05 = 2.086. for 20 d.f.
Solution:(Solve it yourself !)
Hint steps: s2 = 88.4, s = 9.402, t = 0.994
Problem 114. Two types of batteries are tested for their length of life and the fol-
lowing results were obtained.
Battery A : n1 = 10, x̄1 = 500 hrs,σ12 = 100
Battery B : n2 = 10, x̄2 = 560hrs, σ2′ = 121
Compute Student’s t and test whether there is a significant difference in the two
means.
Problem 115. From a random sample of 10 pigs fed on diet A, the increases in
weight in a certain period were 10, 6, 16, 17, 13, 12, 8, 14, 15, 9 lbs. For another
random sample of 12 pigs fed on diet B, the increases in the same period were
7, 13, 22, 15, 12, 14, 18, 8, 21, 23, 10, 17lbs. Test whether diets A and B differ
significantly as regards their effect on increases in weight?
Solution:(Solve it yourself !)
Hint steps: x̄ = 12, ȳ = 15,
s2 = 21.1, s = 4.65, t = 1.6
Problem 116. Two horses A and B were tested according to the time (In Seconds) to
run a particular race with the following results :
Horse A 28 30 32 33 33 29 34
Horse B 29 30 30 24 27 29 −
Test whether you can discriminate between the two horses.
4.6
IET
2
χ =
P
where Oi =
Testing of hypothesis: Chi-square test
P
+
(O2 − E2 )2
E2
⇒ χ = 2
+
n
i=1
(O3 − E3 )2
Ei = N (total frequency)
E3
X (Oi − Ei )
Ei
2
+ . . . .. +
(On − En )2
En
Chi – square test as a test of goodness of fit: χ2 Test helps us to test the goodness
of fit of the distributions such as Binomial, Poisson and Normal distributions. If
the calculated value of χ2 is less than the table value of χ2 at a specified level of
significance, the hypothesis is accepted. Otherwise the hypothesis is rejected.
AJ
Conditions for applying chi square test
(i) No theoretical(Expected) frequency should be smaller than 5. If any theoretical
cell frequency is less than 5, then for the application of χ2 test, the difficulty is
overcome by grouping two or more classes together before calculating (Oi − Ei ).
It is important to remember that the number of degrees of freedom is determined
with the number of classes after regrouping.
P P
(ii) Oi = Ei = N (total frequency)
Problem 117. In an experiment of pea breeding, the following frequencies of seed
were obtained.
Round-yellow Wrinkled Yellow Round Green Wrinkled Green Total
315 101 108 32 556
Can you say that the experiment is in agreement with the theory which predicts pro-
portion of frequencies, 9 : 3 : 3 : 1 ? Given that χ20.05 = 7.815 for 3 d.f. [VTU
July 2013]
Solution: Let H0 : The experimental result support the theory i.e., there is no sig-
nificant difference between the observed and theoretical frequency. Under H0 , the
corresponding frequencies can be calculated as
9
× 556 = 312.75 ≈ 313,
16
3
× 556 = 104.25 ≈ 104,
16
IET
3
× 556 = 104.25 ≈ 104,
16
1
× 556 = 34.75 ≈ 35
16
Hence the table of observed and theoretical frequencies is,
Solution: Null Hypothesis, H0 : The accidents are uniformly distributed over the
week.
Under this H0 , the expected frequencies of the accidents on each of these days =
84
7
= 12
Expected frequencies of the accidents are given below :
Oi : 14 16 8 12 11 9 14 Sum= 84
IET
Ei : 12 12 12 12 12 12 12 Sum= 84
n
2
X (0i − E1 )2
χ =
i=1
Ei
(14 − 12)2 (16 − 12)2 (8 − 12)2 (12 − 12)2
= + + +
12 12 12 12
(11 − 12)2 (9 − 12)2 (14 − 12)2
+ + +
12 12 12
1
= [4 + 16 + 16 + 0 + 1 + 9 + 4]
12
50
= = 4.17
12
Problem 119. Fit a poisson distribution for the following data and test the goodness
of fit in 5% level of significance. Given that χ20.05 = 7.815 for 3 d.f
x 0 1 2 3 4
frequency 122 60 15 2 1
The Poisson distribution and the theoretical frequencies are calculated using
λx e−λ
N p(x) = 200 ×
x!
λx
= 121.3 × (∵ 200e−0.5 = 121.3)
x!
Hence theoretical frequencies are
(0.5)0 (0.5)1
121.3 × = 121, 121.3 × = 61,
IET
0! 1!
(0.5)2 (0.5)3
121.3 × = 15, 121.3 × = 3,
2! 3!
(0.5)4
121.3 × =0
4!
Therefore new table with observed and expected(theoretical) frequencies is
n
2
X (Oi − Ei )2
χ =
i=1
Ei
(122 − 121)2 (60 − 61)2 (15 − 15)2 (3 − 3)2
= + + +
121 61 15 3
= 0.025 < χ20.05 = 7.815
Therefore the fitness is considered good.
∴ The hypothesis that the fitness is good can be accepted.
AJ
Problem 120. Fit a binomial distribution for the data
No. of Heads 0 1 2 3 4
Frequency 122 60 15 2 1
and also test the goodness of fit given that χ20.05 = 7.815 for 3 d.f. (VTU Model
2020)
The expected frequencies are calculated by using N p(x) where N = 200 and
p(x) = n Cx px q n−x , x = 0, 1, 2, 3, 4
N P (0) = (200) 4 C0 p0 q 4−0 = 117.23 ≈ 117
N P (1) = (200) 4 C1 p1 q 4−1 = 66.99 ≈ 67
N P (2) = (200) 4 C2 p2 q 4−2 = 14.36 ≈ 14
N P (3) = (200) 4 C3 p3 q 4−3 = 1.36 ≈ 2 (Adjusted to get N = 200)
IET
N P (4) = (200) 4 C4 p4 q 4−4 = 0.05 ≈ 0
No. of heads 0 1 2 3 4 5
Frequency 6 27 72 112 71 32
Test the hypothesis that data follows binomial distribution distribution. (ν = 5, χ20.05 =
11.07)
[VTU:JUNE/JULY-15, Jan 2015, July 2013, 2004]
IET
N P (2) = 320 × 5 C2 p2 q 3 = 100
N P (3) = 320 × 5 C3 p3 q 2 = 100
N P (4) = 320 × 5 C4 p4 q = 50
N P (5) = 320 × 5 C5 p5 q 0 = 10
Thus, expected frequencies Ei are respectively 10, 50, 100, 100, 50, 10.
Table of Observed and Expected frequencies is
i
(6 − 10)2 (27 − 50)2 (72 − 100)2
= + +
10 50 100
(112 − 100)2 (71 − 50)2 (32 − 10)2
AJ
+ + +
100 50 10
= 78.68.
As the calculated value is very much higher than the tabulated value of χ20.05 =
11.07 for ν = 5 d.f., we reject the null hypothesis and accept the alternate hypoth-
esis that data does not follow the binomial distribution.
Problem 122. Four coins are tossed 100 times & following Results were obtained.
No. of heads 0 1 2 3 4
Frequency 5 29 36 25 5
Fit a binomial distribution for the data and test the goodness of fit. (ν = 4, χ20.05 =
9.49) [VTU: Model 2020, Dec 2018]
IET
∴ χ2 =
⇒ χ2 = +
N P (0) = N × 4 C0 (0.5)0 (0.5)4−0 = 100 × 0.0625 = 6.25 ≈ 6
N P (1) = N × 4 C1 (0.5)1 (0.5)4−1 = (100)(0.25) = 25
N P (2) = N × 4 C2 (0.5)2 (0.5)4−2 = (100)0.375 = 37.5 ≈ 38
N P (3) = N × 4 C3 (0.5)3 (0.5)4−3 = (100)0.25 = 25
N P (4) = N × 4 C4 (0.5)3 (0.5)4−4 = (100)0.0625 = 6.25 ≈ 6
Hence the table of observed and theoretical frequencies is,
Oi 5 29 36 25 5 Sum = 100
Ei 6 25 38 25 6 Sum = 100
"
X (0i − Ei )2
1 16
+
Ei
4
#
+0+
1
6 25 38 6
= 1.0786
χ = 1.0786 < χ20.05 = 9.49 for 4 d.f.
2
AJ
Hence we accept H0 and conclude that the fitness is good.
Problem 123. A die was thrown 60 times and the following frequency distribution
was observed:
Faces 1 2 3 4 5 6
Frequency 15 6 4 7 11 17
Test whether the die is unbiased at 5% significance level.
Solution: The frequencies in the given data are the observed frequencies.
Assuming that dice is unbiased, the expected number of frequencies for the numbers
1, 2, 3, 4, 5, 6 to appear on the face is 60
6
= 10 each.
Now the data is as follows:
x 1 2 3 4 5 6
Oi 15 6 4 7 11 17
Ei 10 10 10 10 10 10
" #
X (Oi − Ei )2
χ2 =
i
Ei
(15 − 10)2 (15 − 6)2 (15 − 4)2
= + +
10 10 10
4.7 F-test
=
+
10
IET
(15 − 7)2
= 31.10
10
+
(15 − 11)2
10
[25 + 81 + 121 + 64 + 16 + 4] =
+
Problem 124. The I.Q.’s of 25 students from one college showed a variance of 16
and those of an equal number from the other college had a variance of 8 . Discuss
whether there is any significant difference in variability of intelligence.
Solution: σ1 2 = 16, σ2 2 = 8
σ1 2 16
F ==2 =
σ2 2 8
Tabulated value of F at 5% level of significance = 1.98
IET
Calculated value of F = 2 Tabulated value of F (1.98)
Hence, variability of intelligence is just significant at 5% level of significant.
Tabulated value of F at 1% level of significance = 2.62
Calculated value F = 2 < Tabulated value of F (2.62)
Hence, variability of intelligence is not significant at 1% level of significance.
Problem 125. Two samples of sizes 9 and 8 give the sum of squares of deviations
from their respective means equal to 160 inches 2 and 91 inches 2 respectively. Can
these be regarded as drawn from the same normal population?
Problem 126. Two independent samples of sizes 7 and 6 have the following values :
Sample A: 28 30 32 33 33 29 34
Sample B : 29 30 30 24 27 29
Examine whether the samples have been drawn from normal populations having the
same variance?[Given that the values of F at 5% level for (6, 5) d.f. is 4.95 and
for (5, 6) d.f. is 4.391 ]
Solution: H0 : The samples have been drawn from normal populations having the
same variance.
219 169
x̄ = = 31.285 and ȳ = = 28.166.
7 6
Then
IET
1 X
2
S1 = (xi − x̄)2
n1 − 1
1
= (28 − 31.285)2 + (30 − 31.285)2
6
+ (32 − 31.285)2 + (33 − 31.285)2
+ (33 − 31.285)2 + (29 − 31.285)2
+(34 − 31.285)2
1
= [10.791 + 1.651 + 0.511 + 2.941
6
+ 2.941 + 5.221 + 7.371] = 5.238
and
1 X
S22 = (yi − ȳ)2
n2 − 1
1
= (29 − 28.166)2 + (30 − 28.166)2
5
+ (30 − 28.166)2 + (24 − 28.166)2
AJ
+(27 − 28.166)2 + (29 − 28.166)2
1
= [0.695 + 3.364 + 3.364 + 17.355
5
+ 1.359 + 0.695] = 5.366.
Therefore, the test statistics is given by
S22 5.366
F = 2 = = 1.025.
S1 5.238
Further, since numbers of degree of freedom are 6 and 5 , we have from table,
F0.05 (6, 5) = 4.95. Since the calculated value is less than the tabular value, we
accept H0 .
i.e. The samples have been drawn from normal populations having the same vari-
ance.
Problem 127. The nicotine content (in mg ) of two samples of tobacco were found
to be as follows :
Sample A: 24 27 26 21 25
Sample B: 27 30 28 31 22 36
Can it be said that the two samples came from the same population?
Solution:H0 : The samples have been drawn from normal populations having the
IET
same variance.
Suppose that x̄ be the sample mean for the sample B and ȳ be the sample mean of
the sample A. Then
174 123
x̄ = = 29 and ȳ = = 24.6
6 5
1 X
2
S1 = (x1 − x̄)2
n1 − 1
1
= (27 − 29)2 + (30 − 29)2 + (28 − 29)2
5
+(31 − 29)2 + (22 − 29)2 + (36 − 29)2
1
= [4 + 1 + 1 + 4 + 49 + 49] = 21.6,
5
1 X
S22 = (y1 − ȳ)2
n2 − 1
1
= (24 − 24.6)2 + (27 − 24.6)2 + (26 − 24.6)2
4
+(21 − 24.6)2 + (25 − 24.6)2
AJ
1
= [0.36 + 5.76 + 1.96 + 12.96 + 0.16] = 5.3.
4
Therefore, the statistics for F-test is
S12 21.6
F = 2 = = 4.08.
S2 5.3
But tabular value of F (5, 4) = 6.26.
Since the calculated value is less than the tabular value, we accept H0 .
i.e. The samples have been drawn from normal populations having the same variance
Question Bank:Module 4
1. An electrical firm manufactures light bulbs that have a length of life that is ap-
proximately normally distributed, with mean equal to 800 hours and a standard
IET
3. An unknown distribution has a mean of 90 and a standard deviation of 15 . A
sample size of 80 is drawn randomly from the population. Find the probability
that the sum of the 80 values (or the total of the 80 values) is more than 7400 .
Ans: 0.0681
5. State Central limit theorem. Use the theorem to evaluate P [50 < X̄ < 56]
where X̄ represents the mean of a random sample of size 100 from an infinite
population with mean µ = 53 and variance σ 2 = 400 (VTU Model 2023)
Ans:0.8664
6. Certain tubes manufactured by a company have mean life time of 800 hours and
S.D of 60 hours. Find the probability that a random sample of 16 tubes taken
from the group will have a mean life time (a) between 790 hours and 810 hours.
AJ
(b) less than 785 hours. (c) more than 820 hours. (d) between 770 hours and
830 hours. Ans:0.4972, 0.1587, 0.0918, 0.9544
8. In a recent study reported on the Flurry Blog, the mean age of tablet users is
34 years. Suppose the standard deviation is 15 years. Take a sample of size
n = 100. Using central limit theorem, find the probability that the sample
mean age is more than 30 years. (VTU Model 2023) Ans:0.9961
10. Ten individuals are chosen at random from the population and their heights are
found to be inches 63, 63, 64, 65, 66, 69, 69, 70, 70, 71. Discuss the sugges-
IET
tion that the mean height in the universe is 65 inches, given that for 9 degree of
freedom the value of student’s ’t’ at 0.05 level of significance is 2.262 hint
Ans: |t| = 2.02 < t0.05 = 2.262
11. Nine items have values 45, 47, 50, 52, 48, 47, 49, 53, 51. Does the mean of
these differ significantly from assumed mean of 47.5?(ν = 8, t0.05 = 2.31)
[VTU:JUNE/JULY-15, July 2013, 2010] hint Ans:
|t| = 1.84 < t0.05 = 2.31
12. A machinist is making engine parts with axle diameter of 0.7 inch. A random
sample of 10 parts shows mean diameter of 0.742 inch with a standard devia-
tion of 0.04 inch. On the basis of this sample, would you say that the work is
inferior? Ans: t = 3.32 > t0.05 .
14. A group of boys and girls were given an intelligence test. The mean score, S.D.
score and numbers in each group are as follows.
Boys Girls
Mean 74 70.
SD 8 10
n 12 10
Is the difference between the means of the two groups significant at 5% level
of significance t.05 = 2.086. for 20 d.f. Ans:t = 0.994
15. Two types of batteries are tested for their length of life and the following results
were obtained.
Battery A : n1 = 10, x̄1 = 500 hrs,σ12 = 100
16. From a random sample of 10 pigs fed on diet A, the increases in weight in a
certain period were 10, 6, 16, 17, 13, 12, 8, 14, 15, 9 lbs. For another ran-
dom sample of 12 pigs fed on diet B, the increases in the same period were
IET
7, 13, 22, 15, 12, 14, 18, 8, 21, 23, 10, 17lbs. Test whether diets A and B
differ significantly as regards their effect on increases in weight? Ans:t = 1.6
17. Two horses A and B were tested according to the time (In Seconds) to run a
particular race with the following results :
Horse A 28 30 32 33 33 29 34
Horse B 29 30 30 24 27 29 −
Test whether you can discriminate between the two horses. Ans:t = 2.42
18. b) Suppose that 10, 12, 16, 19 is a sample taken from a normal population with
variance 6.25 . Find at 95% confidence interval for the population mean.
19. A sample of size 15 drawn from a normally distributed population has sample
mean 35 and sample standard deviation 14. Construct a 95% confidence interval
for the population mean. Ans : (27.2, 42.8)
20. A random sample of 12 students from a large university yields mean GPA 2.71
with sample standard deviation 0.51. Construct a 90% confidence interval for
AJ
the mean GPA of all students at the university. Assume that the numerical
population of GPAs from which the sample is taken has a normal distribution.
Ans :(2.45, 2.97).
21. For a sample of 10 cows that had recently given birth, the mean protein content
in 40oz of milk was 3.4 g with a sample standard deviation of 0.4 g. Find a
95% confidence interval for the mean protein content of the milk. Ans:
(3.19,3.61)
22. A simple random sample of 20 statistics students during a statistics exam gives
an average pulse rate 74.4 with a standard deviation of 10. (a) Find 90%, 95%,
99% confidence intervals for the average pulse rate of all statistics students
during an exam. Ans : (70.53,78.27), (69.72,79.08) and (68.00,80.80)
23. 100 packs were tested and the mean weight was 24.1 grams. Calculate a 99%
confidence interval for the population mean µ. Ans: (23.69, 24.51)
24. In an experiment of pea breeding, the following frequencies of seed were ob-
tained.
Round-yellow Wrinkled Yellow Round Green Wrinkled Green Total
315 101 108 32 556
Can you say that the experiment is in agreement with the theory which predicts
IET
proportion of frequencies,
9 : 3 : 3 : 1 ? Given that χ20.05 = 7.815 for 3 d.f. Ans: χ2 = 0.51 ¡
χ20.05 = 7.815
25. The following table gives the number of aircraft accidents that occured during
the various days of the week. FInd whether the accident are uniformly dis-
ributed over the week. (χ20.05 = 9.41 for 4 d.f.) Ans:
2 2
χ0.05 = 4.17 < χ0.05 = 12.59
26. A die was thrown 60 times and the following frequency distribution was ob-
Faces 1 2 3 4 5 6
served: Test whether the die is unbiased
Frequency 15 6 4 7 11 17
at 5% significance level. Ans: 31.10
27. Fit a poisson distribution for the following data and test the goodness of fit in
5% level of significance. Given that χ20.05 = 7.815 for 3 d.f
x 0 1 2 3 4
AJ
frequency 122 60 15 2 1
and also test the goodness of fit given that χ20.05 = 7.815 for 3 d.f. Ans:
χ2 = 1.516 < χ20.05 = 7.815
29. A set of 5 similar coins tossed 320 times gives following table:
No. of heads 0 1 2 3 4 5
Frequency 6 27 72 112 71 32
30. Four coins are tossed 100 times & following Results were obtained.
No. of heads 0 1 2 3 4
Frequency 5 29 36 25 5
IET
Fit a binomial distribution for the data and test the goodness of fit. (ν =
4, χ20.05 = 9.49) Ans: χ2 = 1.0786 < χ20.05 = 9.49
31. The I.Q.’s of 25 students from one college showed a variance of 16 and those of
an equal number from the other college had a variance of 8 . Discuss whether
there is any significant difference in variability of intelligence. Use 1% and 5%
levels of significance. Ans:F = 1.54 < F0.05 = 3.73
32. Two samples of sizes 9 and 8 give the sum of squares of deviations from their
respective means equal to 160 inches 2 and 91 inches 2 respectively. Can these
be regarded as drawn from the same normal population? Hint Ans:
F = 1.54 < F0.05 = 3.73
33. Two independent samples of sizes 7 and 6 have the following values :
Sample A: 28 30 32 33 33 29 34
Sample B : 29 30 30 24 27 29
Examine whether the samples have been drawn from normal populations having
AJ
the same variance? Ans:F = 1.025
34. In two groups of ten children each, the increase in weight due to different diets
during the same period, were in pounds
3, 7, 5, 6, 5, 4, 4, 5, 3, 6
8, 5, 7, 8, 3, 2, 7, 6, 5, 7.
Is there a significant difference in their variability? Ans:F = 2.4 < 3.2
IET
Design of Experiments & ANOVA
Syllabus
182
Lecture Notes - BCS301 : Mathematics for Computer Science - Module 1 : Probability Distributions Page 183
Almost all experiments involve the three basic principles, viz., randomization, repli-
cation and local control.
The Principle of Replication suggests repeating experiments to get more reliable
results. Instead of testing a treatment on just one subject, we apply it to many. For
example, if comparing two types of rice, instead of growing each in only one part
IET
of a field, we grow them in several parts. This makes our conclusions more depend-
able. While this may introduce some computational challenges, it is essential for
improving accuracy and getting more precise results. Replication helps in reducing
the impact of random variations, making our findings more precise.
The Principle of Randomization is like a shield for experiments. It suggests mak-
ing choices randomly to protect against sneaky outside factors. For example, when
planting different rice types, instead of deciding where each one goes, we use ran-
domness. This helps us avoid problems like uneven soil fertility. By using random-
ization, we ensure that any hidden factors are balanced out by chance, giving us more
accurate results. It’s like letting luck even things out, making our conclusions more
trustworthy.
The Principle of Local Control is like a detective strategy for experiments. Imag-
ine you are testing different rice types and want to rule out the impact of varying
soil fertility. Here Here’s what you do: you Divide the field into similar sections
(homogenous blocks) and split each block into parts for each rice type. Randomly
AJ
assign treatments within each block to spread out soil fertility differences. This way,
you can precisely measure and remove the impact of soil fertility from your final
results. In simple terms, local control helps you focus on the rice types by keeping
other factors, like soil fertility, in check.
Example:
Consider a plant growth experiment:
Experimental Unit: Individual plants.
Treatments: Varying amounts of sunlight exposure.
Randomization: Randomly assign plants to different sunlight levels.
Replication: Repeat the experiment with multiple plants for each sunlight level.
Local Control: Ensure consistent soil conditions for all plants.
By applying these principles, we can draw conclusions about how different sunlight
levels impact plant growth while minimizing the influence of other variables.
IET
this technique, one can draw inferences about whether the samples have been drawn
from populations having the same mean.
In ANOVA, we have to make two estimates of population variance:
(i) one based on between samples variance(Cause Variance or Treatment variance)
and
(ii) the other based on within samples variance(Chance Variance or Error variance or
random variance or residual variance.).
Then the said two estimates of population variance are compared with F-test, given
by
Estimate of population variance based on between samples variance
F =
Estimate of population variance based on within samples variance
Example:
Consider a study comparing the effectiveness of three teaching methods:
Between Samples Variance: Examines how much the mean scores differ among the
three teaching methods.
Within Samples Variance: Assesses the variability in scores within each teaching
method group.
AJ
Using the F-test, we can determine if the observed differences in mean scores among
teaching methods are statistically significant or merely due to chance.
A completely randomized design (CRD) is one where the treatments are assigned
completely at random to experimental units, so that each experimental unit has the
same chance of receiving any particular treatment.
Example:
Consider a study evaluating the growth of plants with three different fertilizers. In a
CRD, each plant is randomly assigned one of the three fertilizers. This ensures that
any observed differences in plant growth are due to the fertilizers and not because of
any intentional patterns in how we assigned the fertilizers. We want to be sure that
any changes in growth are connected to the specific fertilizers and not due to how we
chose which plant gets which fertilizer.
IET
Under the one-way ANOVA, we consider only one factor or property, or charac-
teristic, and then determine if there are differences within that factor. The experi-
mental units are randomly assigned to different levels of the single factor or treat-
ment. The analysis involves comparing the means of different treatment groups.
For example, we might want to know if three different groups of students have dif-
ferent mean marks. To see if there is a statistically significant difference in mean
marks, we can conduct a one-way ANOVA.
The technique involves the following steps:
Consider k samples x1 , x2 , x3 , · · · , xk .
Let x11 , x12 , x13 , · · · , x1n1 be the observations in the first sample.
Let x21 , x22 , x23 , · · · , x2n2 be the observations in the second sample.
In this way let xk1 , xk2 , xk3 , · · · , xknk be the observations in the kth sample.
AJ
Steps involved in one way ANOVA:
IET
Step 7. The sum of squares within the samples can be found out by subtracting SSB
from TSS.
SSE= TSS-SSB
Step 8. The degrees of freedom for total sum of squares (TSS) is (N − 1). The
degrees of freedom for SSB is (k − 1) and the degrees of freedom for SSE
is (N − k)
Step 9. Mean sum of squares : The mean sum of squares for treatments is
S12 =
SSB
k−1
and the mean sum of squares for error is
S22 =
SSE
N−k
.
Step 10. ANOVA Table: The above sum of squares together with their respective
degrees of freedom and mean sum of squares can be summarized in the
AJ
ANOVA table in the following way.
Step 11. Calculation of F : Variance ratio F is the ratio between greater variance
and smaller variance, thus
S12
F = 2
S2
If variance within the treatment is more than the variance between the treat-
ments, then numerator and denominator should be interchanged and degrees
of freedom adjusted accordingly.
Step 12. Critical value of F or table value of F : The Critical value of F or table
value of F is obtained from F distribution table for (k − 1, N − k) d.f at
5% level of significance.
Step 13. Inference: If calculated F value is less than table value of F , we accept our
null hypothesis H0 and say that there is no significant difference between
treatments.
IET
If calculated F value is greater than table value of F , we reject our H0 and
say that the difference between treatments is significant.
Problem 128. A test was given to five students taken at random from the fifth class
of three schools of a town. The individual scores are
School I 9 7 6 5 8
School II 7 4 5 4 5
School III 6 5 6 7 6
Solution:
To carry out the analysis of variance, we form the following tables.
AJ
Total Squares
School I 9 7 6 5 8 35 1225
School II 7 4 5 4 5 25 625
School III 6 5 6 7 6 30 900
Total T = 90 2750
Table of Squares:
School I 81 49 36 25 64
School II 49 16 25 16 25
School III 36 25 36 49 36
Alternative Hypothesis H1 : µ1 ̸= µ2 ̸= µ3
Level of significance : Let α : 0.05
2
Test statistic Correct factor (C.F) = GN
902
=
15
8100
= = 540
15
Total sum of squares (TSS) = ΣΣxij 2 − C.F
ANOVA Table
Between Schools
Error
=
Sources of variation
IET
= 568 −
2750
5
P
3−1=2
12
540 = 28
Sum of squares between schools = ni i − C.F
− 540
= 550 − 540 = 10
Sum of squares due to error (SSE) = TSS − SST
d.f S.S
10
18
T2
= 28 − 10 = 18
10
2
18
12
M.S.S
= 5.0
= 1.5
5
1.5
F ratio
= 3.33
Total 15 − 1 = 14
Table Value: Table value of Fe for (2, 12) d.f at 5% level of significance is 3.8853
Inference: Since calculated F0 is less than table value of Fe , we may accept our H0
AJ
and say that there is no significant difference between the performance of schools.
Problem 129. Three processes A, B and C are tested to see whether their outputs are
equivalent. The following observations of outputs are made:
A 10 12 13 11 10 14 15 13
B 9 11 10 12 13
C 11 10 15 14 12 13
Solution:Solve yourself !
Hint: T=228, CF=2736, TSS=58, SSB=7,
F=1.097¡F(2,16)=3.63, accept H0
IET
Aggregate: 1 2 3 4 5
551 595 639 417 563
457 580 615 449 631
450 508 511 517 522
731 583 573 438 613
499 633 648 415 656
632 517 677 555 679
Total 3320 3416 3663 2791 3664 16, 854
Solution:Solve yourself !
Hint:
SST = 209, 377, SSA = 85, 356
SSE = 209, 377 − 85, 356 = 124, 021.
F = 4.30 > F0.05 = 2.76, REJECT H0
Solution:Solve yourself !
Hint: T=258, CF=5547, TSS=39, SSB=15,
Problem 132. Set up an analysis of variance table for the following per acre produc-
tion data for three varieties of wheat, each grown on 4 plots and state if the variety
differences are significant.
Solution:Solve yourself !
IET Hint: T=60, TSS=32, SSB=8, SSE=24, F = 1.5 < F (2, 9) = 4.26, ACCEPT H0
Problem 133. The varieties of wheat A, B, C were sown in 4 plots each and the
following yields in quintals per acre were obtained.
A 8 4 6 7
B 7 6 5 3
C 2 5 4 4
Test the significance of difference between the yields of varieties, given that 5%
tabulated value of F for 2 and 9 degrees of freedom is 4.26 .
Solution:Solve yourself !
Hint: T=61, CF=310.08, TSS=34.92, SSB=12.67, SSE=22.25,
Outputs
Machine I Machine II Machine III
10 9 20
15 7 16
11 5 10
10 6 14
Given that the value of F at 5% level of significance for (2, 9)d. f is 4.26
Solution:Solve yourself !
Hint: CF=1704.08, TSS=284.92, SSB=162.17, SSE=122.75,
Problem 135. A manufacturing company has purchased three new machines of dif-
ferent makes and wishes to determine whether one of them is faster than the others in
Producing a certain output. Five hourly production figures are observed at random
IET
from each other machine and the results are given below:
Observation A1 A2 A3
1 25 31 24
2 30 39 30
3 36 38 28
4 38 42 25
5 31 35 28
Use ANOVA and determine whether the machines are significantly different in their
mean speed.
Solution:Solve yourself ! Hint : Since values are slight larger, you can use coding
method. Previous method can also be used.
Problem 136. Three different kinds of food are tested on three groups of rats for 5
weeks. The objective is to check the difference in mean weight (in grams) of the rats
AJ
per week. Apply one-way ANOVA using a 0.05 significance level to the following
data:
Food 1 8 12 19 8 6 11
Food 2 4 5 4 6 9 7
Food 3 11 8 7 13 7 9
Solution:Solve yourself !
Hint: Ans: N=18, T=154, cf=1316, TSS=230, SSR=75,
Two-way ANOVA technique is used when the data are classified on the basis of two
factors. In this, The experimental units are grouped into blocks based on one factor,
and treatments are then randomly assigned within each block. For example,
IET
seeds and also on the basis of different varieties of fertilizers used.
• A business firm may have its sales data classified on the basis of different sales-
men and also on the basis of sales in different regions.
• In a factory, the various units of a product produced during a certain period may
be classified on the basis of different varieties of machines used and also on the
basis of different grades of labour.
Such a two-way design may have repeated measurements of each factor or may not
have repeated values. We shall now explain the two-way ANOVA technique in the
context of both the said designs with the help of examples.
IET
Step 6. Take the total of different rows, Ti and then obtain the square of each row
total (i.e. Ti2 ) and divide such squared values of each row by the number
of items in the concerning row(i.e. ni ) and take the total of the result thus
obtained. Finally, subtract the correction factor from this total to obtain the
sum of squares of deviations for variance between rows (SSR).
X T2
i
SSR = − C.F. i = 1, 2, 3, . . . r
i
n i
, where r is the number of rows. (where subscript i represents ith row).
Step 7. Take the total of different columns, Tj and then obtain the square of each
column total (i.e. Tj2 ) and divide such squared values of each column by the
number of items in the concerning column(i.e. nj ) and take the total of the
result thus obtained. Finally, subtract the correction factor from this total to
obtain the sum of squares of deviations for variance between columns (SSC).
X Tj2
SSC = − C.F. j = 1, 2, 3, . . . c,
j
n j
where c is the number of columns.( where subscript j represents jth column).
AJ
Step 8. The Sum of squares of deviations for residual or error can be found out by
subtracting sum of SSB and SSC from TSS.
SSE= TSS-(SSB+SSC)
Step 10. Mean sum of squares : The mean sum of squares for rows is
2 SSR
SR =
r−1
The mean sum of squares for columns is
2 SSC
SC =
c−1
and the mean sum of squares for error is
SSE
IET
2
SE =
(c − 1)(r − 1)
Step 11. ANOVA Table: The above sum of squares together with their respective
degrees of freedom and mean sum of squares can be summarized in the
Two-way ANOVA table in the following way.
Source of Sum of squares Degrees of Mean square
F-ratio
variation (SS) freedom (d.f.) (M S)
Between 2
2 = SSR SR
rows SSR (r − 1) SR (r−1)
FR = 2
SE
treatment
Between 2
2 = SSC SC
columns SSC (c − 1) SC (c−1)
FC = 2
SE
treatment
Residual 2 SSE
SSE (c − 1)(r − 1) SE =
or error (c − 1)(r − 1)
Total TSS (N − 1)
In the table
c = number of columns
AJ
r = number of rows
SSE (residual or error) = TSS(Total)-(SSB +SSC )
.
2
SR
Step 12. Calculation of F : F-ratio concerning variation between rows is FR = 2
SE
2
SC
F-ratio concerning variation between rows is FC = 2
SE
In these two cases, if the numerator variance is less than the denominator
variance, then numerator and denominator should be interchanged and de-
grees of freedom should be adjusted accordingly.
Step 13. Conclusion : If F-ratio concerning variation between rows(i.e.FR ) ¿ its table
value, then the difference among rows means is considered significant. Sim-
ilarly, the F-ratio concerning variation between columns can be interpreted.
In case of a two-way design with repeated measurements for all of the categories,
we can obtain a separate independent measure of inherent or smallest variations. For
this measure we can calculate the sum of squares and degrees of freedom in the
same way as we had worked out the sum of squares for variance within samples in
IET
the case of one-way ANOVA. Total SS, SS between columns and SS between rows
can also be worked out as stated above. We then find left-over sums of squares
and left-over degrees of freedom which are used for what is known as ‘interaction
variation’ (Interaction is the measure of inter relationship among the two different
classifications). After making all these computations, ANOVA table can be set up
for drawing inferences.
Steps involved in Two-way ANOVA technique (when repeated values are there):
Step 6. Take the total of different rows, Ti and then obtain the square of each row
total (i.e. Ti2 ) and divide such squared values of each row by the number
of items in the concerning row(i.e. ni ) and take the total of the result thus
obtained. Finally, subtract the correction factor from this total to obtain the
sum of squares of deviations for variance between rows (SSR).
X T2
i
SSR = − C.F. i = 1, 2, 3, . . . r
i
n i
, where r is the number of rows. (where subscript i represents ith row).
IET
Step 7. Take the total of different columns, Tj and then obtain the square of each
column total (i.e. Tj2 ) and divide such squared values of each column by the
number of items in the concerning column(i.e. nj ) and take the total of the
result thus obtained. Finally, subtract the correction factor from this total to
obtain the sum of squares of deviations for variance between columns (SSC).
X Tj2
SSC = − C.F. j = 1, 2, 3, . . . c,
j
n j
where c is the number of columns.(where subscript j represents jth column).
Step 8. The Sum of squares of deviations for residual or error can be found out by
subtracting sum of SSB and SSC from TSS.
SSE= TSS-(SSB+SSC)
i.e. determine the sum of squares within the samples(error variation)
Step 9. Sum of Squares for interrelationship (or interaction) variation can be worked
out as under: SSI=TSS – (SSR + SSC + SSE)
AJ
Step 10. Degrees of freedom (d.f.) can be worked out as under:
The degrees of freedom for total sum of squares (TSS) = (N − 1)
The degrees of freedom for variance between rows (SSR) = (r − 1)
The degrees of freedom for variance between columns(SSC) = (c − 1)
The degrees of freedom for error variance = N2
The degrees of freedom for interaction variation, SSI = N2 − c − r + 1
where c = number of columns
and r = number of rows
Step 11. Mean sum of squares : The mean sum of squares for rows is
2 SSR
SR =
r−1
IET
2
SE = N
2
Finally, the ANOVA table for Two Way ANOVA can be set up which as
follows:
Step 12. ANOVA Table: The above sum of squares together with their respective
degrees of freedom and mean sum of squares can be summarized in the
Two-way ANOVA table in the following way.
Sum of
Source of Degree of Mean Square
Squares(S F-ratio
variation freedom (MS)
S)
SS between 2 = SSC
2
SC
SSC (c − 1) SC c−1
Fc = 2
SE
columns
SS between 2 = SSR
2
SR
SSR (r − 1) SR r−1
FR = 2
SE
rows
Interrelations N SSI SSI
SSI 2
−c−r+1 SI2 = N
−c−r+1
FI = 2
SE
hip 2
In the table
c = number of columns
r = number of rows
2
SR
Step 13. Calculation of F : F-ratio concerning variation between rows is FR = 2
SE
2
SC
F-ratio concerning variation between rows is FC = 2
SE
SI2
F-ratio concerning variation between rows is FI = 2
SE
In these two cases, if the numerator variance is less than the denominator
variance, then numerator and denominator should be interchanged and de-
grees of freedom should be adjusted accordingly.
Step 14. Conclusion : If F-ratio concerning variation between rows(i.e.FR ) ¿ its table
value, then the difference among rows means is considered significant. Sim-
ilarly, the F-ratio concerning variation between columns can be interpreted.
IET
Coding method is based on an important property of F-ratio that its value does not
change if all the n item values are either multiplied or divided by a common figure
or if a common figure is either added or subtracted from each of the given n item
values. Through this method big figures are reduced in magnitude by division or
subtraction and computation work is simplified without any disturbance on the F-
ratio. This method should be used specially when given figures are big or otherwise
inconvenient. Once the given figures are converted with the help of some common
values, then all the steps of the short-cut method (for both ONE-way ANOVA and
TWO-way ANOVA) stated above can be adopted for obtaining and interpreting F-
ratio.
Problem 137. The following data represents the number of units of production per
day turned out by different workers using 4 different types of machines.
AJ
Machine Type
Workers A B C D
1 44 38 47 36
2 46 40 52 43
3 34 36 44 32
4 43 38 46 33
5 38 42 49 39
1. Test whether the five men differ with respect to mean productivity and
2. Test whether the mean productivity is the same for the four different machine
types.
Solution : Let us take the null hypothesis that H0 : The 5 workers(row factors) do
not differ with respect to mean productivity and The mean productivity is the same
for the four different machines(column factors) To simplify calculation let us use
coding method and subtract 40 from each value, the new values are
IET
4 -6 -4 4 -8 -14
4 3 -2 6 -7 0
5 -2 2 9 -1 8
Total 5 -6 38 -17 20
= 574
AJ
sum of squares of deviations for variance between rows (SSR).
X T2
i
SSR = − C.F.
i
n i
IET
The degrees of freedom for residual variance (SSE) = (c − 1)(r − 1) = 12
where c = number of columns
and r = number of rows
Mean sum of squares : The mean sum of squares for rows is
2 SSR 161.5
SR = = = 40.375
r−1 4
The mean sum of squares for columns is
2 SSC 338.3
SC = = = 112.93
c−1 3
and the mean sum of squares for error is
2 SSE 73.7
SE = = = 6.14
(c − 1)(r − 1) 12
ANOVA Table:
Problem 138. Set up an analysis of variance table for the following two-way design
results:
IET
X 7 5 4
Y 3 3 3
Z 8 7 4
Problem 139. The following table gives the monthly sales (in thousand rupees) of a
certain firm in three states by its four salesmen:
Salesmen
States Total
A B C D
X 5 4 4 7 20
Y 7 8 5 4 24
Z 9 6 6 7 28
Total 21 18 15 18 72
Set up an analysis of variance table for the above information. Calculate F -coefficients
AJ
and state whether the difference between sales affected by the four salesmen and dif-
ference between sales affected in three States are significant.
Problem 140. Perform a two-way ANOVA table on the data given below:
Treatment 1
Treatment 2 1 2 3
1 30 26 38
2 24 29 28
3 33 24 35
4 36 31 30
5 27 35 33
Problem 141. A company appoints four salesman A, B, C and D observes their sales
in three seasons - Summer, winter and Monsoon. The figures (in lac of Rs) are given
in the following table:
IET Seasons
Summer
Winter
Monsoon
Hint: FR = 1.87 < F (6, 2) = 19.3, FC = 1 < F (6, 3), Hence we accept the null hypothesis for both row and column factors. i.e.
That is there is no difference between the sales in the seasons and there is no difference between in the sales of the 4 salesmen.
Problem 142. Three varieties of coal were analysed by four chemists and the ash-
content in the varieties was found to be as under.
Chemists
AJ
Varieties 1 2 3 4
A 8 5 5 7
B 7 6 4 4
C 3 6 5 4
FR = 1.22, FC = 2.27,
Problem 143. Set up ANOVA table for the following information relating to three
drugs testing to judge the effectiveness in reducing blood pressure for three different
groups of people:
IET
Drug
Group of People X Y Z
A 14 10 11
15 9 11
B 12 7 10
11 8 11
C 10 11 8
11 11 7
Solution : Step (i) Here ni = 6 for each row and nj = 6, for each column.
Total no. of observations, N = 18
P P
Step (ii) Grand total, T = Ti = Tj = 187,
2
thus, the correction factor, CF = TN = 187×187
18
= 1942.72
Step (iii) TotalX
SumX of squares,
AJ
T SS = (xij )2 − CF
i j
70 × 70 59 × 59 58 × 58
= + + − 1942.72
6 6 6
= 816.67 + 580.16 + 560.67 − 1942.72
IET
= 14.78
Step (v) Sum of squared deviations between columns,
X Tj2
SSC(i.e., betweendrugs) = − C.F.
j
n j
73 × 73 56 × 56 58 × 58
= + + − 1942.72
6 6 6
= 888.16 + 522.66 + 560.67 − 1942.72
= 28.77
Step (vi) Error
X Deviations,
SSE = (x − x̄)2
= (14 − 14.5)2 + (15 − 14.5)2 + (10 − 9.5)2 + (9 − 9.5)2 + (11 − 11)2
+ (11 − 11)2 + (12 − 11.5)2 + (11 − 11.5)2 + (7 − 7.5)2 + (8 − 7.5)2
+ (10 − 10.5)2 + (11 − 10.5)2 + (10 − 10.5)2 + (11 − 10.5)2
+ (11 − 11)2 + (11 − 11)2 + (8 − 7.5)2 + (7 − 7.5)2
= 3.50
AJ
Step (vii) for interaction variation,
SSI = T SS − (SSR + SSC + SSE)
= 76.28 − [28.77 + 14.78 + 3.50]
= 29.23
The above table shows that all the three F-ratios are significant of 5% level which
means that the drugs act differently, different groups of people are affected differently
and the interaction term is significant. In fact, if the interaction term happens to be
significant, it is pointless to talk about the differences between various treatments
i.e., differences between drugs or differences between groups of people in the given
case.
IET
Problem 144. The following table gives the monthly sales (in thousand rupees) of a
certain firm in three states by its four salesman. Set up ANOVA table for the informa-
tion which is given below. Calculate F-coefficients and state whether the difference
between sales affected by the four salesman and difference between sales affected in
three states are significant. Also taking a significant value of 5%?
Salesman
States
A B C D
X 5 4 4 7
3 5 9 8
Y 7 8 5 4
3 8 7 5
Z 9 6 6 7
5 4 3 1
Solution :
AJ
Salesman
States Ti Ti2
A B C D
X 5 4 4 7 45 2025
3 5 9 8
Y 7 8 5 4 47 2209
3 8 7 5
Z 9 6 6 7 41 1681
5 4 3 1
Tj 9 6 6 1
Tj2 81 36 36 1
ii.
( square of T ) (133 × 133)
Correction Factor, CF = =
n 24
= 737.04
= 737
iii.
SS total deviation
IET
= [25 + 9 + 16 + 25 + 16 + 81 + 49 + 64 + 49 + 9 + 64 + 64 + 25
+ 49 + 16 + 25 + 81 + 25 + 36 + 16 + 36 + 9 + 49 + 1] − CF
= 823 − 737
= 86
iv. SS between columns (i.e. between salesman) deviation
(32 × 32) (35 × 35) (34 × 34) (32 × 32) (133 × 133)
= + + + −
6 6 6 6 24
= (170.6 + 204.1 + 192.6 + 170.6) − 737.04
= 737.9 − 737.04
= 0.86
v. SS between
rows (i.e. between states) deviation
(45 × 45) (47 × 47) (41 × 41) (133 × 133)
= + + −
8 8 8 24
= (253.1 + 276.1 + 210.1) − 737.04
= 739.3 − 737.04
AJ
= 2.26
(vi) SS within sales (i.e. error) deviation
SSE = (1 + 1 + 0.25 + 0.25 + 6.25 + 6.25 + 0.25 + 0.25 + 4 + 4
+ 0 + 0 + 1 + 1 + 0.25 + 0.25 + 4 + 4 + 1 + 1 + 2.25 + 2.25 + 9 + 9)
= 58.5
vii) SS for interrelationship variation,
= 86–(0.86 + 2.26 + 58.5) = 24.38
Sum of Squares Degree of freedom
Source of variation Mean Square (MS) F-ratio
(SS) (DOF)
0.86 4.87
SS between (4 − 1) 3.28 0.28=17.39
0.86
columns =3 = 0.28 = 0.057
2.26
(3 − 1) 2 4.87
SS between rows 2.26 1.13
= 4.31
=2 = 1.13
24.38
SS of 4 6.0
24.38 4 4.87
= 1.23
Interrelationship = 6.0
58.5
SS within (24 − 12) 12
58.5
sales(error) = 12 = 4.87
(24 − 1)
Total 86
= 23
IET
treatment occurs, more than once in any one row or any one column. The ANOVA
technique in case of Latin-square design remains more or less the same as we have
already stated in case of a two-way design, excepting the fact that the variance can
be split into four parts as under:
(i) variance between columns;
(ii) variance between rows;
(iii) variance between varieties;
(iv) residual variance.
All these above stated variances are worked out as under:
Step 3. Find the sum of observations in the i th row,(Ti ) and find the Grand total,
P P
T = Ti = xij
AJ
Step 4. Find the correction factor, using
T2
CF =
N
Step 5. Find the squares of all observations
XX and then find the total sum of squares,
TSS = (xij )2 − CF
i j
Step 6. Find the sum of squares of deviations for variance between rows (SSR).
X T2
i
SSR = − C.F.
i
n i
where Ti is the sum of observations in the i th row,
Step 7. Find the sum of squares of deviations for variance between columns (SSC).
(Tj )2
SS = sum − CF
nj
where Tj is the sum of observations in the j th column,
Step 8. Find the sum of squares of deviations for variance between varieties(letters)
(SSV or SSL).
X (Tv )2
SSV or SSL = − CF
nv
where Tv is the sum of observations of variety type v or letter types v,
1) IET
Step 10. Degrees of freedom (d.f.) can be worked out as under:
The degrees of freedom for total sum of squares (TSS) = (N − 1)
The degrees of freedom for variance between rows (SSR) = (r − 1)
The degrees of freedom for variance between columns(SSC) = (c − 1)
The degrees of freedom for variance between varieties/Letters(SSL) = (v −
Source of
SS d.f. MS F -ratio
variation
Between 2 = SSC
2
SC
SSC c-1 SC c−1
FC = 2
SE
columns
AJ
Between 2 = SSR
2
SR
SSR r-1 SR r−1
FR = 2
SE
rows
Between 2 = SSL
2
SL
SSL c-1 SL c−1
FL = 2
SE
varieties
Residual 2 = SSE
SSE (c-1)(C-2) SE (c−1)(c−2)
or error
Total TSS N-1
Step 11. Conclusion can be drawn based on the calculated values of F compared with
tabulated value of F.
Problem 145. Analyse and interpret the following statistics concerning output of
wheat per field obtained as a result of experiment conducted to test four varieties of
wheat viz., A, B, C and D under a Latinsquare design.
C B A D
25 23 20 20
A D C B
19 19 21 18
B A D c
19 14 17 20
IET
D c B A
17 20 21 15
Solution: Using the coding method, let us subtract 20 from the figures given in each
of the small squares and obtain the coded figures as under:
C B A D
5 3 0 0
A D C B
-1 -1 1 -2
B A D c
-1 -6 -3 0
D C B A
-3 -3 1 -5
Squaring the coded values in various columns and rows we have the following table
of square terms:
C B A D
25 9 0 0
A D C B
1 1 1 4
B A D c
1 36 9 0
IET
D C B A
9 0 1 25
(T )2 (−12)(−12)
Correction factor , CF = = =9
n 16
(T )2
SS for total variance = Σ (Xij )2 − = 122 − 9 = 113
n
X (Tj )2
SS for variance between columns = − CF
nj
(0)2 (−4)2 (−1)2 (−7)2
= + + + −9
4 4 4 4
66
= − 9 = 7.5
4
SS for variance between rows
X (Ti )2
= − CF
ni
(8)2 (−3)2 (−10)2 (−7)2
= + + + −9
AJ
4 4 4 4
222
= − 9 = 46.5
4
SS for variance between varieties would be worked out as under:
For finding SS for variance between varieties, we would first rearrange the coded
data in the following form:
Source of
variation
Between
SS
7.50
IET
113 − (7.5 + 46.5 + 48.5) = 10.50
d.f. for variance between columns = (c − 1) = (4 − 1) = 3 d.f. for variance
between rows = (r − 1) = (4 − 1) = 3 d.f. for variance between varieties
= (v − 1) = (4 − 1) = 3 d.f. for total variance
= (n − 1) = (16 − 1) = 15
= (c − 1)(c − 2) = (4 − 1)(4 − 2) = 6
ANOVA table can now be set up as shown below:
d.f.
3 7.50
MS
= 2.50 2.50
F -ratio
= 1.43
5% F-limit
F (3, 6) = 4.76
columns 3 1.75
The above table shows that variance between rows and variance between varieties are
significant and not due to chance factor at 5% level of significance as the calculated
values of the said two variances are 8.85 and 9.24 respectively which are greater than
the table value of 4.76 . But variance between columns is insignificant and is due to
chance because the calculated value of 1.43 is less than the table value of 4.76 .
Problem 146. Analyse the variance in the following latin square of yields (in kos) of
paddy where A, B, C, D denote the different methods of Cultivation.
Examine whether the different methods of cultivation have given significantly differ-
ent yields. Given that F3,6 = 4.76.
IET
iii) There is no difference between letters (methods of cultivation) Using coding
Method, subtract by 120 N = 4 + 4 + 4 + 4 = 16
D
B
A
C
2
2
A
C
B
D
1
−1
3
C
A
D
B
3
1
B
D
C
A
2
2 2
Correction factor = Tη = (30)
16
= 56.25
2
x − Tn = 92 − 56.25 = 35.75
P 2
TSS =
SSR = (8)2 + (14)2 + (0)2 + (8)2 − 56.25 = 24.75
2 2 2 2
SSC = (8) +(6) +(6) +(10)
− 56.25 = 2.75
AJ
4
FromX
letters [Take values with respective X
letters]
A = 1 + 2 + D + 2 = 5, C =3+3+1+2=9
X X
B = 2 + 4 + (−1) + 1 = 6, D = 2 + 5 + 0 + 3 = 10.
SSL(varieties)
(5)2 + (6)2 + (9)2 + (10)2
= − 56.25 = 4.25.
4
SSE = T SS − SSR − SSC − SSV
SSE = 35.75 − 24.75 − 2.75 − 4.25 = 4.
Source of
SS d.f. MS F -ratio 5% F-limit
variation
Between
2.75 3 0.92 1.37
columns
Between
24.75 3 8.25 12.31 F (3, 6) = 4.76
rows
Between
4.25 3 1.42 2.12 F (3, 6) = 4.76
varieties
IET
Residual
4 6 0.67
or error
Total 35.75 15
∴ Hypothesis accepted for columns & letters (cultivation method), and Hypothesis
rejected for rows.
Question Bank
1. A test was given to five students taken at random from the fifth class of three
schools of a town. The individual scores are
School I 9 7 6 5 8
School II 7 4 5 4 5
School III 6 5 6 7 6
A: 20 18 19
B: 17 16 19 18
C: 20 21 20 19 18
Is there any significant difference in the production of the three varieties Ans :
Calculated F = 9.11 < Table value of F (9, 2) = 19.3
3. Determine if there is a difference in the mean daily calcium intake for people
with normal bone density, osteopenia, and osteoporosis at a 0.05 alpha level.
The data was recorded as follows:
IET
800 700 350
Hint: F = 1.395 < F (0.05, 2, 15) = 3.68, accept
4. Three processes A, B and C are tested to see whether their outputs are equiva-
lent. The following observations of outputs are made:
A 10 12 13 11 10 14 15 13
B 9 11 10 12 13
C 11 10 15 14 12 13
in hundreds of hours.
A B C D
20 25 24 23
19 23 20 20
21 21 22 20
7. Set up an analysis of variance table for the following per acre production data
IET
for three varieties of wheat, each grown on 4 plots and state if the variety dif-
ferences are significant.
8. The varieties of wheat A, B, C were sown in 4 plots each and the following
yields in quintals per acre were obtained.
A 8 4 6 7
B 7 6 5 3
AJ
C 2 5 4 4
Test the significance of difference between the yields of varieties, given that
5% tabulated value of F for 2 and 9 degrees of freedom is 4.26 .
9. Three types of fertilizers are used on three groups of plants for 5 weeks. We
want to check if there is a difference in the mean growth of each group. Using
the data given below apply a one-way ANOVA test at 0.05 significant level
Fertilizer-1 6 8 4 5 3 4
Fertilizer-2 8 12 9 11 6 8
Fertilizer-3 13 9 11 8 7 12
Hint: CF=1152, TSS=152, SSR=84, SSE=68, F = 9.33 > F (0.05, 2, 15) = 3.68, reject
10. Three different machines are used for a production. On the basis of the out-
puts, set up one - way ANOVA table and test whether the machines are equally
effective.
Outputs
Machine I Machine II Machine III
10 9 20
IET
15 7 16
11 5 10
10 6 14
Given that the value of F at 5% level of significance for (2, 9)d. f is 4.26
11. A manufacturing company has purchased three new machines of different makes
and wishes to determine whether one of them is faster than the others in Pro-
ducing a certain output. Five hourly production figures are observed at random
from each other machine and the results are given below:
Observation A1 A2 A3
1 25 31 24
2 30 39 30
3 36 38 28
4 38 42 25
5 31 35 28
AJ
Use ANOVA and determine whether the machines are significantly different in
their mean speed.
12. Three different kinds of food are tested on three groups of rats for 5 weeks. The
objective is to check the difference in mean weight (in grams) of the rats per
week. Apply one-way ANOVA using a 0.05 significance level to the following
data:
Food 1 8 12 19 8 6 11
Food 2 4 5 4 6 9 7
Food 3 11 8 7 13 7 9
Hint: CF=1317.55, TSS=228.45, SSB=84, SSE=68, S12 = 42, S22 = 4.53, F = 9.33, Reject
13. A trial was run to check the effects of different diets. Positive numbers indicate
weight loss and negative numbers indicate weight gain. Check if there is an
average difference in the weight of people following different diets using an
ANOVA Table.
Low
Low Fat Low Calorie Low protein
carbohydrate
IET
8 2 3 2
9 4 5 2
6 3 4 -1
7 5 2 0
3 1 3 3
Hint : CF=252,, TSS=123.2, SSR=75.8, SSE=47.4, F = 8.43 > F (0.05, 3, 16) = 3.24 reject
14. The following data represents the number of units of production per day turned
out by different workers using 4 different types of machines.
The following data represents the number of units of production per day turned
out by different workers using 4 different types of machines.
Machine Type
Workers A B C D
1 44 38 47 36
2 46 40 52 43
AJ
3 34 36 44 32
4 43 38 46 33
5 38 42 49 39
1. Test whether the five men differ with respect to mean productivity and
2. Test whether the mean productivity is the same for the four different machine
types.
Hint: Using coding, Subtract 40 from each value, CF=20, TSS=574,SSC=161.5, SSR=338.3, SSE=73.7,
15. Set up an analysis of variance table for the following two-way design results:
Per Acre Production Data of Wheat (in metric tonnes)
Varieties of seeds A B C
Varieties of fertilizers
W 6 5 5
X 7 5 4
Y 3 3 3
Z 8 7 4
IET
Also state whether variety differences are significant at 5% level.
16. The following data show the number of worms quarantined from the GI areas of
four groups of muskrats in a carbon tetrachloride anthelmintic study. Conduct
a two-way ANOVA test.
I II III IV
33 41 12 38
32 38 35 43
26 40 46 25
14 23 22 13
30 21 11 26
Hint: CF=48, TSS=2245, SSC=151, SSR=1056.25, SSE=1037.75, FR = 3.053, Fc = 1.718, Accept
17. The following table gives the monthly sales (in thousand rupees) of a certain
firm in three states by its four salesmen:
AJ
Salesmen
States Total
A B C D
X 5 4 4 7 20
Y 7 8 5 4 24
Z 9 6 6 7 28
Total 21 18 15 18 72
18. A company appoints four salesman A, B, C and D observes their sales in three
seasons - Summer, winter and Monsoon. The figures (in lac of Rs) are given in
the following table:
Salesman
Seasons
A B C D
Summer 45 40 38 37
Winter 43 41 45 38
Monsoon 39 39 41 41
IET
19. Three varieties of coal were analysed by four chemists and the ash-content in
the varieties was found to be as under.
factors.
A
B
C
Chemists
Varieties 1 2 3 4
8 5 5 7
7 6 4 4
3 6 5 4
Hint: CF=341.33, TSS=24.67, SSR=6.17, SSC=3.34, SSE=15.16, FR = 1.22, Fc = 3.27, Accept H0 for both row and column
20. Set up ANOVA table for the following information relating to three drugs test-
ing to judge the effectiveness in reducing blood pressure for three different
groups of people:
AJ
Amount of Blood Pressure Reduction in Millimeters of Mercury
Drug
Group of People X Y Z
A 14 10 11
15 9 11
B 12 7 10
11 8 11
C 10 11 8
11 11 7
Do the drugs act differently? Are the different groups of people affected differ-
ently?Answer the above questions taking a significant level of 5%.
21. The following table gives the monthly sales (in thousand rupees) of a certain
firm in three states by its four salesman. Set up ANOVA table for the infor-
mation which is given below. Calculate F-coefficients and state whether the
difference between sales affected by the four salesman and difference between
sales affected in three states are significant. Also taking a significant value of
5%?
Salesman
States
A B C D
IET
X 5 4 4 7
3 5 9 8
Y 7 8 5 4
3 8 7 5
Z 9 6 6 7
5 4 3 1
22. The following are the number of mistakes made in 5 successive days by 4
technicians working for a photographic laboratory. Test whether the difference
among the four sample means can be attributed to chance. [Test at a level of
significance α = 0.01 ].
I II III IV
6 14 10 9
14 9 12 12
10 12 7 8
AJ
8 10 15 10
11 14 11 11
Hints: T=213, CF=2268.45, TSS=114.55, SSR=12.95, SSE=101.6, F = 1.47 < F (16, 3) = 8.7, Accept
23. The following data represents the number of units of production per day turned
out by different workers using 4 different types of machines.
Machine Type
Workers A B C D
1 44 38 47 36
2 46 40 52 43
3 34 36 44 32
4 43 38 46 33
5 38 42 49 39
1. Test whether the five men differ with respect to mean productivity and
2. Test whether the mean productivity is the same for the four different machine
types.
Hint: FR = 6.576 > F0.05 (4, 12) = 3, 26, FC = 18.393 > F0.05 (3, 12) = 3.49.
24. Set up the analysis of variance for the following results of a Latin Square De-
IET
sign. Use 0.01 level of significance.
A C B D
12 19 10 8
C B D A
18 12 6 7
B D A C
22 10 5 21
D A C B
12 7 27 17
Ans: FR = 1.12, Fc = 1.08, Fv = 11.72.
The values FR and FC are less than the table value F0.01 (3, 6) = 9.78 and the value FV is greater
than the table value F0.01 (3, 6) = 9.78
so we conclude that there is no significant difference due to rows and columns but there is significant
difference due to the treatments(letters).
25. The figures in the following 5 ∗ 5 Latin square are the numbers of minutes,
AJ
engines E1 , E2 , E3 , E4 and E5 tuned up by mechanics M1 , M2 , M3 , M4
and M5 , ran with a gallon of fuel A, B, C, D and E.
E1 E2 E3 E4 E5
M1 A B C D E
31 24 20 20 18
M2 B C D E A
21 27 23 25 31
M3 C D E A B
21 27 25 29 21
M4 D E A B C
21 25 33 25 22
M5 E A B C D
21 37 24 24 20
Use the level of significance α = 0.01 to test 1. The null hypothesis H0 that
there is no difference in the performance of the five engines.
2. H0 that the persons who tuned up these engines have no effect on their
performance.
3. H0 that the engines perform equally well with each of the fuels.
(Ans: FR = 2.31, Fc = 8.24, FL = 31.28.
IET
The values FR and FC are less than the table value F0.01 (4, 12) = 5.41 and the value FV is greater
than the table value F0.01 (4, 12) = 5.41
So we conclude that there is no significant difference due to rows (Mechanics) and columns(Engines),
but there is significant difference due to the fuels(letters)).
26. In a Latin square experiment, given below are the yields in quintals per acre
on paddy crop carried out for testing the effect of five fertilizers A, B, C, D, E.
Analyze the data for variations.
B 25 A 18 E 27 D 30 C 27
A 19 D 31 C 29 E 26 B 23
C 28 B 22 D 33 A 18 E 27
E 28 C 26 A 20 B 25 D 33
D 32 E 25 B 23 C 28 A 20
Hint: Use coding, subtract 25 from each value
N=28, T=18, CF=12.96, TSS=483.04, SSR=3.04, SSC=14.24, SSL=454.64, SSE=11.12, FR = 1.22, FC = 3.83, FL = 122.22, Write
the conclusion
AJ
27. Present your conclusions after doing analysis of variance to the following re-
sults of the Latin-square design experiment conducted in respect of five fertil-
izers which were used on plots of different fertility.
A B C D E
16 10 11 9 9
E C A B D
10 9 14 12 11
B D E C A
15 8 8 10 18
D E B A C
12 6 13 13 12
C A D E B
13 11 10 7 14
accepted for row factors, but rejected for Column factors and Letters
28. Analyze and interpret the following statistics concerning output of wheat per
field obtained as a result of experiment conducted to test four varieties of wheat
viz. A, B, C and D under a Latin- square design
IET C
25
A
19
B
19
D
17
B
23
D
19
A
14
C
20
A
20
C
21
D
17
B
21
D
20
B
18
C
20
A
15
Hint: Subtracting 20 from each value (coding),
CF=9, TSS=113, SSR=4, SSC=7.5, SSL=48.5, SSE=10.5, FR = 8.85, Fc = 1.428, FL = 9.23, Compare, H0 is accepted for column