MODULE 3
LARGE AND SMALL SAMPLE TESTS
Standard Error
Let u be a statistic satisfying the conditions of the central limit theorem.
Then
𝑢 − 𝐸(𝑢)
𝑡= ~𝑁(0,1)
𝑉(𝑢)
The standard deviation of the distribution of any statistic is called its
standard error. If u is the statistic 𝑉(𝑢) is its standard error.
Testing of a hypothesis concerning the mean of a population
Let 𝜇 be the mean and 𝜎 be the S.D of a population. Consider the
hypothesis 𝐻0 : 𝜇 = 𝜇0
The alternative hypothesis may be any of the following
𝐻1 : 𝜇 > 𝜇0
𝐻1 : 𝜇 < 𝜇0
𝐻1 : 𝜇 ≠ 𝜇0
Case1: 𝝈 is known
We know that 𝑥 is the best test statistic for 𝜇
𝐸 𝑥 =𝜇
𝜎2
𝑉 𝑥 =
𝑛
𝜎
So 𝑆. 𝐸 𝑥 =
𝑛
Wehave
𝑥−𝜇 𝑥−𝜇 𝑛
𝑡= 𝜎 = ~𝑁(0,1)
𝜎
𝑛
a) Consider the alternative hypothesis: 𝑯𝟏 : 𝝁 < 𝝁𝟎
Test statistic is
𝑥−𝜇 𝑛
𝑡=
𝜎
We reject the hypothesis (𝑯𝟎 ) when 𝒕 < −𝒕𝜶 ,where 𝑡𝛼 is so
determined that 𝑃 𝑡 < 𝑡𝛼 = 𝛼
b) Consider the alternative hypothesis: 𝑯𝟏 : 𝝁 ≠ 𝝁𝟎
Test statistic is
𝑥−𝜇 𝑛
𝑡=
𝜎
We reject the hypothesis (𝑯𝟎 ) when 𝒕 ≥ 𝒕𝜶 ,where 𝒕𝜶 is
𝟐 𝟐
so determined that 𝑃 𝑡 ≥ 𝒕𝜶 =𝛼
𝟐
c) Consider the alternative hypothesis: 𝑯𝟏 : 𝝁 > 𝜇𝟎
Test statistic is
𝑥−𝜇 𝑛
𝑡=
𝜎
We reject the hypothesis (𝑯𝟎 ) when 𝒕 > 𝒕𝜶 ,where 𝑡𝛼 is so
determined that 𝑃 𝑡 > 𝑡𝛼 = 𝛼
Case 2 : 𝝈 is unknown
If 𝜎 is unknown then take sample s.d (s) as an approximation to 𝜎. So
the
Test statistic is
𝑥−𝜇 𝑛
𝑡=
𝑠
Do testing as described above
Testing equality of the means of two populations
Case 1 : 𝝈 is known
Let 𝜇1 & 𝜇2 be the means and 𝜎1 & 𝜎2 be the S.D of two populations.
Let the sample of sizes 𝑛1 & 𝑛2 be taken and let 𝑥1 & 𝑥2 be the
means and 𝑠1 & 𝑠2 be the sample S.D
Suppose we have to test 𝐻0 : 𝜇1 = 𝜇2
We know that
𝜎1
𝑥1 ~𝑁 𝜇1 ,
𝑛1
And
𝜎2
𝑥2 ~𝑁 𝜇2 ,
𝑛2
𝐸 𝑥1 − 𝑥2 = 𝜇1 − 𝜇2 = 0 𝑢𝑛𝑑𝑒𝑟 𝑡ℎ𝑒 ℎ𝑦𝑝𝑜𝑡ℎ𝑒𝑠𝑖𝑠
𝜎1 2 𝜎2 2
𝑉(𝑥1 − 𝑥2 ) = +
𝑛1 𝑛2
Therefore
𝑥1 − 𝑥2 − 𝐸 𝑥1 − 𝑥2
𝑡= ~𝑁 0,1
𝑉(𝑥1 − 𝑥2 )
𝑥1 − 𝑥2
𝑡= ~ 𝑁(0,1)
𝜎1 2 𝜎2 2
+
𝑛1 𝑛2
a) If the alternative hypothesis: 𝑯𝟏 : 𝝁 < 𝝁𝟎
Test statistic is
𝑥1 − 𝑥2
𝑡=
𝜎1 2 𝜎2 2
+
𝑛1 𝑛2
We reject the hypothesis (𝑯𝟎 ) when 𝒕 < −𝒕𝜶 ,where 𝑡𝛼 is so
determined that 𝑃 𝑡 < 𝑡𝛼 = 𝛼
b) Consider the alternative hypothesis: 𝑯𝟏 : 𝝁 ≠ 𝝁𝟎
Test statistic is
𝑥1 − 𝑥2
𝑡=
𝜎1 2 𝜎2 2
+
𝑛1 𝑛2
We reject the hypothesis (𝑯𝟎 ) when 𝒕 ≥ 𝒕𝜶 ,where 𝒕𝜶 is
𝟐 𝟐
so determined that 𝑃 𝑡 ≥ 𝒕𝜶 =𝛼
𝟐
c) Consider the alternative hypothesis: 𝑯𝟏 : 𝝁 > 𝜇𝟎
Test statistic is
𝑥1 − 𝑥2
𝑡=
𝜎1 2 𝜎2 2
+
𝑛1 𝑛2
We reject the hypothesis (𝑯𝟎 ) when 𝒕 > 𝒕𝜶 ,where 𝑡𝛼 is so
determined that 𝑃 𝑡 > 𝑡𝛼 = 𝛼
Case 2 : 𝝈 is unknown
𝑛 1 𝑠1 2 +𝑛 2 𝑠2 2
𝐼𝑓 𝜎 𝑖𝑠 𝑢𝑛𝑘𝑛𝑜𝑤𝑛 𝑡𝑎𝑘𝑒 𝜎1 2 = 𝜎2 2 = 𝜎 =
𝑛 1 +𝑛 2
Testing the hypothesis that a proportion has a specified value
(𝑯𝟎 : 𝒑 = 𝒑𝟎)
Let p denotes the proportion of characteristic and 𝑞 = 1 − 𝑝 is the
proportion of not possessing the characteristics.
Consider the hypothesis 𝐻0 : 𝑝 = 𝑝0
Let a sample of size 𝑛 be taken and let x be the number of units
possessing the characteristic, then x 𝐵(𝑛, 𝑝)
So
𝐸 𝑥 = 𝑛𝑝0 & 𝑉 𝑥 = 𝑛𝑝0 𝑞0
Test statistic
𝑥−𝐸 𝑥
𝑡= ~ 𝑁(0,1)
𝑉 𝑥
𝑥 − 𝑛𝑝0
𝑡= ~ 𝑁 0,1
𝑛𝑝0 𝑞0
𝑥
− 𝑝0
𝑡= 𝑛 ~ 𝑁 0,1
1
𝑛𝑝0 𝑞0
𝑛
𝑥
− 𝑝0
𝑡= 𝑛 ~ 𝑁 0,1
𝑛𝑝0 𝑞0
𝑛2
𝑥
− 𝑝0
𝑡= 𝑛 ~ 𝑁 0,1
𝑝0 𝑞0
𝑛
If the alternative hypothesis is
𝐻1 : 𝑝 ≠ 𝑝0 𝑡ℎ𝑒𝑛 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑤ℎ𝑒𝑛 𝒕 ≥ 𝒕𝜶
𝟐
𝐻1 : 𝑝 > 𝑝0 𝑡ℎ𝑒𝑛 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑤ℎ𝑒𝑛 𝒕 > 𝒕∝
𝐻1 : 𝑝 < 𝑝0 𝑡ℎ𝑒𝑛 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑤ℎ𝑒𝑛 𝒕 < −𝒕∝
Testing equality of proportion in two populations(𝑯𝟎 : 𝒑𝟏 = 𝒑𝟐 )
Let samples of sizes 𝑛1 &𝑛2 be taken from thw two populations and
let 𝑥1 & 𝑥2 be the number of units possessing the specifeied
characteristics in the two samples.
Sample proportions
𝑥1 𝑥2
𝑝1 ′ = 𝑎𝑛𝑑 𝑝2 ′ =
𝑛1 𝑛2
.
Let 𝑝1 = 𝑝2 = 𝑝
Under the hypothesis
𝑥1 1 1
𝐸 𝑝1 ′ = 𝐸 = 𝐸 𝑥1 = 𝑛 𝑝 =𝑝
𝑛1 𝑛1 𝑛1 1
𝑥2 1 1
𝐸 𝑝2 ′ = 𝐸 = 𝐸 𝑥2 = 𝑛 𝑝 =𝑝
𝑛2 𝑛2 𝑛2 2
𝑥1 1 1 𝑝𝑞
𝑉 𝑝1 ′ = 𝑉 = 2 𝑉 𝑥1 = 2 . 𝑛1 𝑝𝑞 =
𝑛1 𝑛1 𝑛1 𝑛1
Similarly
𝑝𝑞
𝑉 𝑝2 ′ =
𝑛2
Test statistics
𝑥1 𝑥2 𝑥 𝑥
− −𝐸 1 − 2
𝑛 𝑛2 𝑛1 𝑛2
𝑡= 1 𝑥1 𝑥2 ~𝑁(0,1)
𝑉 𝑛 −𝑛
1 2
𝑥1 𝑥2
−
𝑛1 𝑛2
𝑡= ~𝑁 0,1
𝑝𝑞 𝑝𝑞
−
𝑛1 𝑛2
𝑥1 𝑥2
−
𝑛1 𝑛2
𝑡= ~𝑁 0,1
1 1
𝑝𝑞 𝑛 + 𝑛
1 2
If p is unknown then
𝑛1 𝑝1 ′ + 𝑛2 𝑝2 ′
𝑝=
𝑛1 + 𝑛2
If the alternative hypothesis is
𝐻1 : 𝑝 ≠ 𝑝0 𝑡ℎ𝑒𝑛 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑤ℎ𝑒𝑛 𝒕 ≥ 𝒕𝜶
𝟐
𝐻1 : 𝑝 > 𝑝0 𝑡ℎ𝑒𝑛 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑤ℎ𝑒𝑛 𝒕 > 𝒕∝
𝐻1 : 𝑝 < 𝑝0 𝑡ℎ𝑒𝑛 𝑟𝑒𝑗𝑒𝑐𝑡 𝐻0 𝑤ℎ𝑒𝑛 𝒕 < −𝒕∝
Goodness of fit
Let there be k classes and let 𝑂𝑖 (observed frequency) be the number of
sample values falling in the 𝑖 𝑡ℎ class. Let 𝐸𝑖 be the expected frequency of
the 𝑖 𝑡ℎ class.
𝑘
2
2
𝑂𝑖 − 𝐸𝑖
𝜒 =
𝐸𝑖
𝑖=1
Follows chi square distribution with 𝑘 − 𝑟 − 1 degrees of freedom, where 𝑟
is the number of independent constraints to be satisfied by the
frequencies.
If 𝜒 2 > 𝜒 2 𝛼 we reject 𝐻0
Where 𝑃 𝜒 2 > 𝜒 2 𝛼 | 𝐻0 = 𝛼
POINTS TO BE REMEMBERED
Sample size 𝑛 should be large(more than 50)
Theoretical frequency of each class should be at least 5. If any class
has frequency less than 5, that class should be combined with the
adjacent class.
Degrees of freedom :
If the hypothesis directly specifies the theoretical frequencies
or the rule for determining the theoretical frequencies the d.f
will be one less than the number of classes.(i.e, 𝑘 − 1 if the
number of classes is 𝑘 )
If 𝑟 parameters are estimated for the calculation of the
theoretical frequencies the d.f is 𝑘 − 𝑟 − 1 where 𝑘 is the
number of classes
If the classification is in the form of a two way table
(contingency table) and if there are 𝑐 columns and 𝑟 rows and
no parameters are estimated then d.f is 𝑐 − 1 (𝑟 − 1)
Testing of independence of qualitative characteristics
Consider two qualitative characteristics A and B divided into r and s classes
respectively (i.e 𝐴1 , 𝐴2 , … 𝐴𝑟 & 𝐵1 , 𝐵2 , … , 𝐵𝑠 ) . Such a classification in which
attributes are divided into more than two classes is known as manifold
classification. The various cell frequencies can be expressed in the following
table known as 𝑟 × 𝑠 contingency table
𝐵1 𝐵2 . . 𝐵𝑗 . . 𝐵𝑠 Total
𝐴1 𝑓11 𝑓12 𝑓1𝑗 𝑓1𝑠 𝑓1.
𝐴2 𝑓21 𝑓22 𝑓2𝑗 𝑓2𝑠 𝑓2.
.
.
𝐴𝑖 𝑓𝑖1 𝑓𝑖2 𝑓𝑖𝑗 𝑓𝑖𝑠 𝑓𝑖.
.
.
𝐴𝑟 𝑓𝑟1 𝑓𝑟2 𝑓𝑟𝑗 𝑓𝑟𝑠 𝑓𝑟.
total 𝑓.1 𝑓.2 𝑓.𝑗 𝑓.𝑠 𝑓..
𝑓𝑖.
𝑃 𝐴𝑖 = 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑡ℎ𝑎𝑡 𝑎 𝑝𝑒𝑟𝑠𝑜𝑛 𝑝𝑜𝑠𝑠𝑒𝑠𝑠𝑒𝑠 𝑡ℎ𝑒 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 𝐴𝑖 =
𝑓..
𝑓.𝑗
𝑃 𝐵𝑗 = 𝑃𝑟𝑜𝑏𝑎𝑏𝑖𝑙𝑖𝑡𝑦 𝑡ℎ𝑎𝑡 𝑎 𝑝𝑒𝑟𝑠𝑜𝑛 𝑝𝑜𝑠𝑠𝑒𝑠𝑠𝑒𝑠 𝑡ℎ𝑒 𝑎𝑡𝑡𝑟𝑖𝑏𝑢𝑡𝑒 𝐵𝑖 =
𝑓..
If the characteristics are independent, the probability that any
observation will fall in the cell 𝐴𝑖 𝐵𝑗 is
𝑓𝑖. 𝑓.𝑗
𝑃(𝐴𝑖 𝐵𝑗 ) = ×
𝑓.. 𝑓..
𝑓𝑖. 𝑓.𝑗 𝑓𝑖.×𝑓.𝑗
So expected frequency in this cell = 𝑓.. × ×𝑓 =
𝑓.. .. 𝑓..
2
𝑓𝑖. × 𝑓.𝑗
𝑓𝑖𝑗 −
𝑂𝑖 − 𝐸𝑖 2 𝑓..
𝜒2 = =
𝐸𝑖
𝑖 𝑗
𝑓𝑖. × 𝑓.𝑗
𝑓..
Follows chi square distribution with 𝑟 − 1 𝑠 − 1 d.f
If 𝜒 2 > 𝜒 2 𝛼 the hypothesis that the characteristics are independent is to
be rejected.
Show that in a 𝟐 × 𝟐 contingency table where frequencies are 𝒂, 𝒃, 𝒄, 𝒅
2
𝑎 + 𝑏 + 𝑐 + 𝑑 𝑎𝑑 − 𝑏𝑐 2
𝜒 =
𝑎+𝑏 𝑐+𝑑 𝑏+𝑑 𝑎+𝑐
The given table is
A Not A Total
B a b a+b
Not B c d c+d
Total a+c b+d n=a+b+c+d
The expected value in the cell 1,1 𝑖𝑠
a+b a+c
𝑛
The expected value in the cell 1,2 𝑖𝑠
a+b b+d
𝑛
The expected value in the cell 2,1 𝑖𝑠
c+d a+c
𝑛
The expected value in the cell 2,2 𝑖𝑠
c+d b+d
𝑛
2 2
a+b a+c c+d b+d
𝑎− 𝑑−
𝑛 𝑛
𝜒2 = +⋯+ (1)
a+b a+c c+d b+d
𝑛 𝑛
2
a+b a+c
𝑎− 𝑛𝑎 − 𝑎 + 𝑏 𝑎 + 𝑐 2
𝑛
=
a+b a+c 𝑛 𝑎 + 𝑏 (𝑎 + 𝑐)
𝑛
2
(𝑎 + 𝑏 + 𝑐 + 𝑑)𝑎 − 𝑎 + 𝑏 𝑎 + 𝑐
=
𝑛 𝑎 + 𝑏 (𝑎 + 𝑐)
𝑎2 + 𝑏𝑎 + 𝑐𝑎 + 𝑑𝑎 − 𝑎2 + 𝑎𝑐 + 𝑏𝑎 + 𝑏𝑐 2
=
𝑛 𝑎+𝑏 𝑎+𝑐
𝑑𝑎 − 𝑏𝑐 2
=
𝑛 𝑎+𝑏 𝑎+𝑐
Similarly
2
a+b b+d
𝑏− 𝑑𝑎 − 𝑏𝑐 2
𝑛
=
a+b b+d 𝑛 𝑎+𝑏 𝑏+𝑑
𝑛
2
c+d a+c
𝑐− 𝑑𝑎 − 𝑏𝑐 2
𝑛
=
c+d a+c 𝑛 c+d a+c
𝑛
2
c+d b+d
𝑑− 𝑑𝑎 − 𝑏𝑐 2
𝑛
=
c+d b+d 𝑛 c+d b+d
𝑛
𝑑𝑎 −𝑏𝑐 2 𝑑𝑎 −𝑏𝑐 2 𝑑𝑎 −𝑏𝑐 2 𝑑𝑎 −𝑏𝑐 2
So 𝜒 2 = + + +
𝑛 𝑎+𝑏 𝑎+𝑐 𝑛 𝑎 +𝑏 𝑏+𝑑 𝑛 c+d a+c 𝑛 c+d b+d
2
𝑑𝑎 − 𝑏𝑐 1 1 1
= + +
𝑛 𝑎+𝑏 𝑎+𝑐 𝑎+𝑏 𝑏+𝑑 c+d a+c
1
+
c+d b+d
2
𝑑𝑎 − 𝑏𝑐 𝑐 + 𝑑 (𝑏 + 𝑑)
=
𝑛 𝑎 + 𝑏 𝑎 + 𝑐 𝑐 + 𝑑 (𝑏 + 𝑑)
c+d a+c
+
𝑎+𝑏 𝑏+𝑑 c+d a+c
𝑎+𝑏 𝑏+𝑑
+
c+d a+c 𝑎+𝑏 𝑏+𝑑
𝑎+𝑏 𝑎+𝑐
+
c+d b+d 𝑎+𝑏 𝑎+𝑐
2
𝑑𝑎 − 𝑏𝑐 𝑐+𝑑 𝑏+𝑑 + c+d a+c + 𝑎+𝑏 𝑏+𝑑 + 𝑎+𝑏 𝑎+𝑐
=
𝑛 𝑎 + 𝑏 𝑎 + 𝑐 𝑐 + 𝑑 (𝑏 + 𝑑)
2
𝑑𝑎 − 𝑏𝑐 𝑐+𝑑 𝑏+𝑑 + a+c + 𝑎+𝑏 𝑏+𝑑 + 𝑎+𝑐
=
𝑛 𝑎 + 𝑏 𝑎 + 𝑐 𝑐 + 𝑑 (𝑏 + 𝑑)
2
𝑑𝑎 − 𝑏𝑐 𝑐+𝑑 𝑛+ 𝑎+𝑏 𝑛
=
𝑛 𝑎 + 𝑏 𝑎 + 𝑐 𝑐 + 𝑑 (𝑏 + 𝑑)
2
𝑑𝑎 − 𝑏𝑐 𝑛 𝑎+𝑏+𝑐+𝑑
=
𝑛 𝑎+𝑏 𝑎+𝑐 𝑐+𝑑 𝑏+𝑑
𝑑𝑎 − 𝑏𝑐 2 𝑛
=
𝑎+𝑏 𝑎+𝑐 𝑐+𝑑 𝑏+𝑑
Testing of Homogeneity
Let there be k sets of observations and let 𝑛𝑖 be the number of
observations in the 𝑖 𝑡ℎ set. Let each set be classified into r classes based
on the value of variable characteristics. Let 𝑓𝑖𝑗 be the number of
observations in the 𝑖 𝑡ℎ class in the 𝑗𝑡ℎ set. The contingency table is
Sets
1 2 . . k Total
1 𝑓11 𝑓12 𝑓1𝑘 𝑓1.
2 𝑓21 𝑓22 𝑓2𝑘 𝑓2.
Class
.
.
r 𝑓𝑟1 𝑓𝑟2 𝑓𝑟𝑘 𝑓𝑟.
Total 𝑓.1 𝑓.2 𝑓.𝑘 𝑓..
We have to examine whether the k sets belong to similar populations
with the same proportion of elements in each class(homogeneous).
2
𝑓𝑖. × 𝑓.𝑗
𝑓𝑖𝑗 −
2
𝑂𝑖 − 𝐸𝑖 2 𝑓..
𝜒 = =
𝐸𝑖
𝑖 𝑗
𝑓𝑖. × 𝑓.𝑗
𝑓..
Follows chi square distribution with 𝑟 − 1 𝑘 − 1 d.f
If 𝜒 2 > 𝜒 2 𝛼 we reject the hypothesis 𝐻0