FALLSEM2020-21 MAT2001 ETH VL2020210107492 Reference Material I 18-Oct-2020 M
FALLSEM2020-21 MAT2001 ETH VL2020210107492 Reference Material I 18-Oct-2020 M
Statistic: Any statistical quantity calculated on the basis of the random sample is called
a statistic. The sample mean, sample standard deviation, sample proportion etc., are called
statistics (plural form of statistic).They will be denoted by Roman letters.
Let (x1, x2, …, xn) be an observed value of (X1, X2, ..., Xn). The collection of (x1, x2, …, xn)
is known as sample space, which will be denoted by ‘S’.
Note 1:
A set of n sample observations can be made on X, NOTE
say, x1, x2, …, xn for making inferences on the unknown
The statistic itself is a random
parameters. It is to be noted that these n values may vary variable and has a probability
from sample to sample. Thus, these values can be considered distribution.
as the realizations of the random variables X1, X2, ..., Xn,
which are assumed to be independent and have the same distribution as that of X. These are also
called independently and identically distributed (iid) random variables.
Note 2:
1 n
In Statistical Inference, the sample standard deviation is defined as S �
n � 1
� ( Xi � X )2 ,
1 n i �1
where X � � Xi . It may be noted that the divisor is n – 1 instead of n.
n i �1
Note 3:
The statistic itself is a random variable, until the numerical values of X1, X2, ..., Xn are observed,
and hence it has a probability distribution.
Notations to denote various population parameters and their corresponding sample
statistics are listed in Table 1.1. The notations will be used in the first four chapters of this book
with the same meaning for the sake of uniformity.
Table 1.1 Notations for Parameters and Statistics
The set of pairs (x1, x2) listed in column 2 constitute the sample space of samples of size 2 each.
Hence, the sample space is:
S = {(4,4), (4,8), (4,12), (4,16), (8,4), (8,8), (8,12), (8,16), (12,4), (12,8), (12,12), (12,16),
(16,4), (16,8), (16,12), (16,16)}
The sampling distribution of X, the sample mean, is determined and is presented in Table 1.3.
Table 1.3 Sampling Distribution of Sample Mean
Note 4: The sample obtained under sampling with replacement from a finite population satisfies
the conditions for a random sample as described earlier.
Note 5: If the sample values are selected under without replacement scheme, independence
property of X1, X2, ... Xn will be violated. Hence it will not be a random sample.
Note 6: When the sample size is greater than or equal to 30, in most of the text books, the sample
is termed as a large sample. Also, the sample of size less than 30 is termed as small sample.
However, in practice, there is no rigidity in this number i.e., 30, and that depends on the nature
of the population and the sample.
Note 7: The learners may recall from XI Standard Textbook that some of the probability distributions
possess the additive property. For example, if X1, X2, ..., Xn are iid N(μ, σ2) random variables, then the
probability distributions of X1 + X2 + ... + Xn and X are respectively the N(nμ, nσ2) and N(μ, σ2/n). These
two distributions, in statistical inference point of view, can be considered respectively as the sampling
distributions of the sample total and sample mean of a random sample drawn from the N(μ, σ2)
distribution. The notation N(μ, σ2) refers to the normal distribution having mean μ and variance σ2.
Solution:
Here, the population is {4, 8, 12, 16}.
Population size (N) = 4, Sample size (n) = 2
Population mean (µ) = (4 + 8 + 12+ 16)/4 = 40/4 = 10
The population variance is calculated as
1 N
�2 � �
N i �1
( Xi � � )2
1� 1
� 4 � 10 � � � 8 � 10 � � �12 � 10 � � �16 � 10 � � �
2 2 2 2
� �36 � 4 � 4 � 36�� � 20.
4� � 4�
s2 20
Hence, SE( X ) = = = 10
n 2
This can also be verified from the sampling distribution of X (see Table 1.3)
V(X ) � � (x � � )2 P ( X � x )
PX Q X PY QY
+ , where m and n are sizes of the samples drawn from
m n
Difference between the the populations whose proportions are respectively P and P ;
X Y
proportions pX and pY Q = 1 – P , Q = 1 – P .
X X Y Y
of two independent
samples: (pX – pY) pq � 1 � 1 � mpX � npY q
� m n � , where p � , � 1 � p , m and n are sample sizes,
� � m�n
when PX and PY are unknown.
1.4 NULL HYPOTHESIS AND ALTERNATIVE HYPOTHESIS
In many practical studies, as mentioned earlier, it is necessary to make decisions about a
population or its unknown characteristics on the basis of sample observations. For example, in bio-
medical studies, we may be investigating a particular theory that the recently developed medicine
is much better than the conventional medicine in curing a disease. For this purpose, we propose a
statement on the population or the theory. Such statements are called hypotheses.
Thus, a hypothesis can be defined as a statement on the population or the values of the
unknown parameters associated with the respective probability distribution. All the hypotheses
should be tested for their validity using statistical concepts and a representative sample drawn from
the study population. ‘Hypotheses’ is the plural form of ‘hypothesis’.
A statistical test is a procedure governed by certain determined/derived rules, which lead to
take a decision about the null hypothesis for its rejection or otherwise on the basis of sample values.
This process is called statistical hypotheses testing.
The statistical hypotheses testing plays an important role, among others, in various fields
including industry, biological sciences, behavioral sciences and Economics. In each hypotheses testing
problem, we will often find as there are two hypotheses to choose between viz., null hypothesis and
alternative hypothesis.
Null Hypothesis:
A hypothesis which is to be actually tested for possible rejection based on a random sample
is termed as null hypothesis, which will be denoted by H0.
YOU WILL KNOW
Alternative Hypothesis:
A statement about the population, which contradicts the null hypothesis, depending upon
the situation, is called alternative hypothesis, which will be denoted by H1.
For example, if we test whether the population mean has a specified value μ0, then the null
hypothesis would be expressed as:
H 0: μ = μ 0
The alternative hypothesis may be formulated suitably as anyone of the following:
(i) H1: μ ≠ μ0
(ii) H1: μ > μ0
(iii) H1: μ < μ0
The alternative hypothesis in (i) is known as two-sided alternative and the alternative
hypothesis in (ii) is known as one-sided (right) alternative and (iii) is known as one-sided (left)
alternative.
Example 1.3
A soft drink manufacturing company makes a new kind of soft drink. Daily sales of the new soft
drink, in a city, is assumed to be distributed with mean sales of ₹40,000 and standard deviation of
₹2,500 per day. The Advertising Manager of the company considers placing advertisements in local
TV Channels. He does this on 10 random days and tests to see whether or not sales has increased.
Formulate suitable null and alternative hypotheses. What would be type I and type II errors?
Solution:
The Advertising Manager is testing whether or not sales increased more than ₹40,000.
Let μ be the average amount of sales, if the advertisement does appear.
The null and alternative hypotheses can be framed based on the given information as
follows:
Null hypothesis: Ho: μ = 40000
i.e., The mean sales due to the advertisement is not significantly different from ₹40,000.
Alternative hypothesis: H1: μ > 40000
i.e., Increase in the mean sales due to the advertisement is significant.
(i) If type I error occurs, then it will be concluded as the advertisement has improved
sales. But, really it is not.
(ii) If type II error occurs, then it will be concluded that the advertisement has not
improved the sales. But, really, the advertisement has improved the sales.
The following may be the penalties due to the occurrence of these errors:
If type I error occurs, then the company may spend towards advertisement. It may increase
the expenditure of the company. On the other hand, if type II error occurs, then the company will
not spend towards advertisement. It may not improve the sales of the company.
Then, X1 and X2 are iid random variables and they have the Bernoulli (P) distribution.
1 2
Let H 0 : P = and H1 : P =
3 3
The sample space is S = {(0,0),(0,1),(1,0),(1,1)}
If T(X1, X2) represents the number of defective screws, in each random sample, then the
statistic T(X1, X2) = X1 + X2 is a random variable distributed according to the Binomial (2, P)
distribution. The possible values of T(X1, X2) are 0, 1 and 2. The values of T(X1, X2) which lead
to rejection of H0 constitute the set {1,2}.
But, the critical region is defined by the elements of S corresponding to T(X1,X2) = 1 or 2.
Thus, the critical region is {(0,1), (1,0), (1,1)} whose dimension is 2.
Note 8: When the sampling distribution is continuous, the set of values of t ( X ) corresponding to
the rejection rule will be an interval or union of intervals depending on the alternative hypothesis.
It is empahazized that these intervals identify the elements of critical region, but they do not
constitute the critical region.
When the sampling distribution of the test statistic Z is a normal distribution, the critical
values for testing H0 against the possible alternative hypothesis at two different levels of
significance, say 5% and 1% are displayed in Table 1.6.
Table 1.6 Critical values of the Z statistic
Example 1.5
Suppose a pizza restaurant claims its average pizza delivery time is 30 minutes. But you
believe that the restaurant takes more than 30 minutes. Now, the null and the alternate hypotheses
can be formulated as
H0 : μ = 30 minutes and H1: μ > 30 minutes
Suppose that the decision is taken based on the delivery times of 4 randomly chosen pizza
deliveries of the restaurant. Let X1, X2, X3, and X4 represent the delivery times of the such four
occasions. Also, let H0 be rejected, when the sample mean exceeds 31. Then, the critical region is
� x � x � x3 � x 4 �
Critical Region = �(x1 , x2 , x3 , x 4 ) | x � 1 2 � 31�
� 4 �
In this case, P � X � 31� will be the area, which fall at the right end under the curve representing
–
the sampling distribution of X. Hence, this test can be categorized as a right-tailed test.
H0 : μ = 5 and H1: μ ≠ 5.
Suppose that the decision on H0 is made based on the diameter of 10 randomly selected
ball-bearings. Let Xi, i = 1, 2, …, 10 represent the diameter of the randomly chosen ball bearings.
Then, the critical region is
� x � x � ... � x10 �
Critical Region = �(x1 , x2 ,..., x10 ) | x � 1 2 � 4.75 or � 5.10 �
� 10 �
In this case, P � X � 4.75 � is the area, which will fall at the left end and P � X � 5.10 � is the
area, which will fall at the right end under the curve representing the sampling distribution of X.
This kind of test can be categorized as a two-tailed test (see Figure 1.3).
Step 1 : Describe the population and its parameter(s). Frame the null hypothesis (H0) and
alternative hypothesis (H1).
Step 2 : Describe the sample i.e., data.
Step 3 : Specify the desired level of significance, α.
Step 4 : Specify the test statistic and its sampling distribution under H0.
Step 5 : Calculate the value of the test statistic under H0 for given sample.
Step 6 : Find the critical value(s) (table value(s)) from the statistical table generated from the
sampling distribution of the test statistic under H0 corresponding to α.
Step 7 : Decide on rejecting or not rejecting the null hypothesis based on the rejection rule
which compares the calculated value(s) of the test statistic with the table value(s).
Now, let us see some of the large sample tests, which apply the above general procedure.
As mentioned in Note-6, for large samples, the size of the sample is greater than or equal to 30.
In the case of two samples considered for a hypotheses testing problem, the test is a large sample
test, when the sizes of both the samples are greater than or equal to 30.
1.9 TEST OF HYPOTHESES FOR POPULATION MEAN
(Population variance is known)
Procedure:
Step 1 : Let µ and σ2 be respectively the mean and the variance of the population under study,
where σ2 is known. If µ0 is an admissible value of µ, then frame the null hypothesis
as H0: µ = µ0 and choose the suitable alternative hypothesis from
(i) H1: µ ≠ µ0 (ii) H1: µ > µ0 (iii) H1: µ < µ0
Step 2 : Let (X1, X2, …, Xn) be a random sample of n observations drawn from the population,
where n is large (n ≥ 30).
Step 3 : Let the level of significance be α.
X −µ0
Step 4 : Consider the test statistic Z = under H0. Here, X represents the sample
σ/ n
mean, which is defined in Note 2. The approximate sampling distribution of the
test statistic under H0 is the N(0,1) distribution.
Step 5 : Calculate the value of Z for the given sample (x1, x2, ..., xn) as
x � �0
z0 �
�/ n
Step 6 : Find the critical value, ze, corresponding to α and H1 from the following table
Step 7 : Decide on H0 choosing the suitable rejection rule from the following table
corresponding to H1.
Example 1.7
A company producing LED bulbs finds that mean life span of the population of its bulbs
is 2000 hours with a standard derivation of 150 hours. A sample of 100 bulbs randomly chosen
is found to have the mean life span of 1950 hours. Test, at 5% level of significance, whether the
mean life span of the bulbs is significantly different from 2000 hours.
Solution:
Step 1 : Let μ and σ represent respectively the mean and standard deviation of the probability
distribution of the life span of the bulbs. It is given that σ = 150 hours. The null and
alternative hypotheses are
Null hypothesis: H0: μ = 2000
i.e., the mean life span of the bulbs is not significantly different from 2000 hours.
Alternative hypothesis: H1 : μ ≠ 2000
i.e., the mean life span of the bulbs is significantly different from 2000 hours.
It is a two-sided alternative hypothesis.
Step 2 : Data
The given sample information are
Sample size (n) = 100, Sample mean (x) = 1950 hours
Step 3 : Level of significance
α = 5%
Step 4 : Test statistic
X � �0
The test statistic is Z � , under H0
�/ n
Under the null hypothesis H0, Z follows the N(0,1) distribution.
1950 � 2000
z0 �
150 100
= –3.33
Thus; | z 0 | = 3.33
Step 6 : Critical value
Since H1 is a two-sided alternative, the critical value at α = 0.05 is ze = z0.025 = 1.96.
(see Table 1.6).
Step 7 : Decision
Since H1 is a two-sided alternative, elements of the critical region are determined
by the rejection rule |z0| ≥ ze. Thus, it is a two-tailed test. For the given sample
information, the rejection rule holds i.e., |z0| = 3.33 > ze = 1.96. Hence, H0 is rejected
in favour of H1: μ ≠ 2000. Thus, the mean life span of the LED bulbs is significantly
different from 2000 hours.
Example 1.8
The mean breaking strength of cables supplied by a manufacturer is 1900 n/m2 with a
standard deviation of 120 n/m2. The manufacturer introduced a new technique in the manufacturing
process and claimed that the breaking strength of the cables has increased. In order to test the
claim, a sample of 60 cables is tested. It is found that the mean breaking strength of the sampled
cables is 1960 n/m2. Can we support the claim at 1% level of significance?
Solution:
Step 1 : Let μ and σ represent respectively the mean and standard deviation of the probability
distribution of the breaking strength of the cables. It is given that σ = 120 n/m2. The
null and alternative hypotheses are
Null hypothesis H0: μ = 1900
i.e., the mean breaking strength of the cables is not significantly different from
1900n/m2.
Alternative hypothesis: H1: μ > 1900
i.e., the mean breaking strength of the cables is significantly more than 1900n/m2.
It may be noted that it is a one-sided (right) alternative hypothesis.
Step 2 : Data
The given sample information are
Sample size (n) = 60. Hence, it is a large sample.
Sample mean (x)= 1960
Step 3 : Level of significance
α = 1%
Step 4 : Test statistic
X −µ0
The test statistic is Z = , under H0
σ/ n
Since n is large, under the null hypothesis, the sampling distribution of Z is the
N(0,1) distribution.
Step 5 : Calculation of test statistic
x � �0
The value of Z under H0 is calculated from z 0 �
�/ n
1960 � 1900
z0 �
120 / 60
Thus, z0 = 3.87
Step 6 : Critical value
Since H1 is a one-sided (right) alternative hypothesis, the critical value at α = 0.01
level of significance is ze = z0.01= 2.33 (see Table 1.6)
Step 7 : Decision
Since H1 is a one-sided (right) alternative, elements of the critical region are determined
by the rejection rule z0 > ze. Thus, it is a right-tailed test. For the given sample
information, the observed value z0 = 3.87 is greater than the critical value ze = 2.33.
Hence, the null hypothesis H0 is rejected. Therefore, the mean breaking strength of the
cables is significantly more than 1900 n/m2.
Thus, the manufacturer’s claim that the breaking strength of cables has increased by
the new technique is found valid.
Procedure:
Step1 : Let µ and σ2 be respectively the mean and the variance of the population under study,
where σ2 is unknown. If µ0 is an admissible value of µ, then frame the null hypothesis
as H0: µ = µ0 and choose the suitable alternative hypothesis from
The approximate sampling distribution of the test statistic under H0 is the N(0,1)
distribution.
YOU WILL KNOW
It is important to note that the exact sampling distribution of Z is the Student’s ‘t’
distribution with (n – 1) degrees of freedom, when n is small (n < 30). This hypotheses testing
problem, when n is small, is discussed, in detail, in Chapter 2. When n is large, the Student’s
‘t’ distribution converges to the N(0,1) distribution.
x � �0
Step 5 : Calculate the value of Z for the given sample (x1, x2, ..., xn) as z 0 � . Here, x
s/ n
and s are respectively the values of X and S calculated for the given sample.
Step 6 : Find the critical value, ze, corresponding to α and H1 from the following table
Example 1.9
A motor vehicle manufacturing company desires to introduce a new model motor vehicle.
The company claims that the mean fuel consumption of its new model vehicle is lower than that
of the existing model of the motor vehicle, which is 27 kms/litre. A sample of 100 vehicles of the
new model vehicle is selected randomly and their fuel consumptions are observed. It is found that
the mean fuel consumption of the 100 new model motor vehicles is 30 kms/litre with a standard
deviation of 3 kms/litre. Test the claim of the company at 5% level of significance.
Solution:
Step 1 : Let the fuel consumption of the new model motor vehicle be assumed to be distributed
according to a distribution with mean and standard deviation respectively μ and σ.
The null and alternative hypotheses are
Null hypothesis H0: μ = 27
i.e., the average fuel consumption of the company’s new model motor vehicle is not
significantly different from that of the existing model.
Alternative hypothesis H1: μ > 27
i.e., the average fuel consumption of the company’s new model motor vehicle is
significantly lower than that of the existing model. In other words, the number of
kms by the new model motor vehicle is significantly more than that of the existing
model motor vehicle.
Step 2 : Data:
The given sample information are
Size of the sample (n) = 100. Hence, it is a large sample.
Sample mean ( x )= 30
Sample standard deviation(s) = 3
Step 3 : Level of significance
α = 5%
Step 4 : Test statistic
The test statistic under H0 is
X � �0
Z� .
S n
Since n is large, the sampling distribution of Z under H0 is the N(0,1) distribution.
Step 5 : Calculation of Test Statistic
The value of Z for the given sample information is calculated from
x � �0
z0 � as
s/ n
30 � 27
z0 �
3 100
Thus, z0 = 10.
Step 6 : Critical Value
Since H1 is a one-sided (right) alternative hypothesis, the critical value at α = 0.05 is
ze = z0.05 = 1.645.
Step 7 : Decision
Since H1 is a one-sided (right) alternative, elements of the critical region are defined by
the rejection rule z0 > ze = z0.05. Thus, it is a right-tailed test. Since, for the given sample
information, z0 = 10 > ze = 1.645, H0 is rejected.
Procedure:
Step-1 : Let µX and s X2 be respectively the mean and the variance of Population -1. Also, let
µY and sY2 be respectively the mean and the variance of Population -2 under study.
Here s X2 and sY2 are known admissible values.
Frame the null hypothesis as H0: µX = µY and choose the suitable alternative hypothesis
from
(i) H1: µX ≠ µY (ii) H1: µX> µY (iii) H1: µX< µY
Step 2 : Let (X1, X2, …, Xm) be a random sample of m observations drawn from Population-1 and
(Y1, Y2, …, Yn) be a random sample of n observations drawn from Population-2, where
m and n are large(i.e., m ≥ 30 and n ≥ 30). Further, these two samples are assumed to be
independent.
( X � Y ) � (� X � �Y )
Z�
Step 4 : Consider the test statistic � X2 � Y2 under H0, where X and Y are
�
m n
respectively the means of the two samples described in Step-2.
( X −Y )
The approximate sampling distribution of the test statistic Z = under H0
(i.e., µX = µY) is the N(0,1) distribution. s X2 sY2
+
m n
( X −Y )
It may be noted that the test statistic, when s X2 = sY2 = σ2, is Z = .
1 1
s +
m n
(x − y )
Step 5 : Calculate the value of Z for the given samples (x1, x2, ..., xm) and (y1, y2 , …, yn) as z o = .
s X2 sY2
+
m n
Here, x and y are respectively the values of X and Y for the given samples.
Step 6 : Find the critical value, ze, corresponding to α and H1 from the following table
Step 7 : Make decision on H0 choosing the suitable rejection rule from the following table
corresponding to H1.
Example 1.10
Performance of students of X Standard in a national level talent search examination was studied.
The scores secured by randomly selected students from two districts, viz., D1 and D2 of a State were
analyzed. The number of students randomly selected from D1 and D2 are respectively 500 and 800.
Average scores secured by the students selected from D1 and D2 are respectively 58 and 57. Can the
samples be regarded as drawn from the identical populations having common standard deviation 2?
Test at 5% level of significance.
Solution:
Step 1 : Let μX and μY be respectively the mean scores secured in the national level talent
search examination by all the students from the districts D1 and D2 considered for the
study. It is given that the populations of the scores of the students of these districts
have the common standard deviation σ = 2. The null and alternative hypotheses are
Null hypothesis: H0: µX = µY
i.e., average scores secured by the students from the study districts are not significantly
different.
Alternative hypothesis: H1: µX ≠ µY
i.e., average scores secured by the students from the study districts are significantly
different. It is a two-sided alternative.
Step 2 : Data
The given sample information are
Size of the Sample-1 (m) = 500
Size of the Sample-2 (n) = 800. Hence, both the samples are large.
Mean of Sample-1 ( x ) = 58
Mean of Sample-2 ( y ) = 57
Step 3 : Level of significance
α = 5%
Step 4 : Test statistic
The test statistic under the null hypothesis H0 is
X �Y
Z� .
1 1
� �
m n
Since both m and n are large, the sampling distribution of Z under H0 is the N(0, 1)
distribution.
Step 5 : Calculation of Test Statistic
The value of Z is calculated for the given sample information from
x�y
z0 � as
1 1
� �
m n
58 � 57
z0 �
1 1
2 �
500 800
z0 = 8.77
Step-6 : Critical value
Since H1 is a two-sided alternative hypothesis, the critical value at α = 0.05 is
ze = z0.025 = 1.96.
Step-7 : Decision
Since H1 is a two-sided alternative, elements of the critical region are defined by the
rejection rule |z0| ≥ ze = z0.025. For the given sample information, |z0| = 8.77 > ze = 1.96.
It indicates that the given sample contains sufficient evidence to reject H0. Thus,
it may be decided that H0 is rejected. Therefore, the average performance of the
students in the districts D1 and D2 in the national level talent search examination
are significantly different. Thus the given samples are not drawn from identical
populations.
1.12 TEST OF HYPOTHESES FOR EQUALITY OF MEANS OF TWO
POPULATIONS (POPULATION VARIANCES ARE UNKNOWN)
Procedure:
Step-1 : Let µX and s X2 be respectively the mean and the variance of Population -1. Also, let
µY and sY2 be respectively the mean and the variance of Population -2 under study.
Here s X2 and sY2 are assumed to be unknown.
Frame the null hypothesis as H0: µX = µY and choose the suitable alternative hypothesis
from
(i) H1: µX ≠ µY (ii) H1: µX> µY (iii) H1: µX< µY
Step 2 : Let (X1, X2, …, Xm) be a random sample of m observations drawn from Population-1
and (Y1, Y2, …, Yn) be a random sample of n observations drawn from Population-2,
where m and n are large (m ≥30 and n ≥30). Here, these two samples are assumed to
be independent.
Step 5 : Calculate the value of Z for the given samples (x1, x2, ...,xm) and (y1, y2, …, yn) as
x�y
z0 �
s 2X sY2 .
�
m n
Here x and y are respectively the values of X and Y for the given samples.
Also, sX2 and sY2 are respectively the values of SX2 and SY2 for the given samples.
Step 6 : Find the critical value, ze, corresponding to α and H1 from the following table
Example 1.11
A Model Examination was conducted to XII Standard students in the subject of Statistics.
A District Educational Officer wanted to analyze the Gender-wise performance of the students
using the marks secured by randomly selected boys and girls. Sample measures were calculated
and the details are presented below:
Solution:
Step 1 : Let μX and μY denote respectively the average marks secured by boys and girls in
the Model Examination conducted to the XII Standard students in the subject of
Statistics. Then, the null and the alternative hypotheses are
Null hypothesis: H0: µµXX = µµYY
i.e., there is no significant difference in the performance of the students with respect to
their gender.
Alternative hypothesis: H1 : µ X ≠ µY
i.e., performance of the students differ significantly with the respect to the gender. It is
a two-sided alternative hypothesis.
Step 2 : Data
The given sample information are
Gender of the
Sample Size Sample Mean Sample Standard Deviation
Students
Boys m = 100 x = 50 sX = 4
Girls n = 150 y = 51 sY = 5
X �Y
Z� .
SX2 SY2
�
m n
The sampling distribution of Z under H0 is the N(0,1) distribution.
x�y as
z0 �
s 2X sY2
�
m n
50 � 51
z0 �
42 52
�
100 150
Thus, z0 = −1.75
Step 6 : Critical value
Since H1 is a two-sided alternative, the critical value at 5% level of significance is
ze = z0.025 = 1.96.
Step 7 : Decision
Since H1 is a two-sided alternative, elements of the critical region are determined by
the rejection rule z0 ≥ z0 . Thus it is a two-tailed test. But, z0 = 1.75 is less than
the critical value ze = 1.96. Hence, it may inferred as the given sample information
does not provide sufficient evidence to reject H0. Therefore, it may be decided that
there is no sufficient evidence in the given sample to conclude that performance of
boys and girls in the Model Examination conducted in the subject of Statistics differ
significantly.
Procedure:
Step 1 : Let P denote the proportion of the population possessing the qualitative characteristic
(attribute) under study. If p0 is an admissible value of P, then frame the null hypothesis
as H0:P = p0 and choose the suitable alternative hypothesis from
Step 2 : Let p be proportion of the sample observations possessing the attribute, where n is
large, np > 5 and n(1 – p) > 5.
Step 3 : Specify the level of significance, α.
p�P
Step 4 : Consider the test statistic Z � under H0. Here, Q = 1 – P.
PQ
n
The approximate sampling distribution of the test statistic under H0 is the N(0,1)
distribution.
p � p0 ,
Step 5 : Calculate the value of Z under H0 for the given data as z 0 � q 0 = 1 – p 0.
p0q0
n
Step 6 : Choose the critical value, ze, corresponding to α and H1 from the following table
Alternative Hypothesis (H1) P ≠ p0 P > p0 P < p0
Critical Value (ze) zα/2 zα -zα
Step 7 : Make decision on H0 choosing the suitable rejection rule from the following table
corresponding to H1.
Example 1.12
A survey was conducted among the citizens of a city to study their preference towards
consumption of tea and coffee. Among 1000 randomly selected persons, it is found that 560 are tea-
drinkers and the remaining are coffee-drinkers. Can we conclude at 1% level of significance from
this information that both tea and coffee are equally preferred among the citizens in the city?
Solution:
Step 1 : Let P denote the proportion of people in the city who preferred to consume tea.
Then, the null and the alternative hypotheses are
Null hypothesis: H 0 : P = 0.5
i.e., it is significant that both tea and coffee are preferred equally in the city.
Alternative hypothesis: H1 : P ≠ 0.5
i.e., preference of tea and coffee are not significantly equal. It is a two-sided alternative
hypothesis.
Step 2 : Data
The given sample information are
Sample size (n) = 1000. Hence, it is a large sample.
No. of tea-drinkers = 560
560
Sample proportion (p) = = 0.56
1000
Step 3 : Level of significance
α = 1%
Step 4 : Test statistic
Since n is large, np = 560 > 5 and n(1 – p) = 440 > 5, the test statistic under the null
p�P
hypothesis, is Z � .
PQ
n
Its sampling distribution under H0 is the N(0,1) distribution.
Thus, z0 = 3.79
Step 6 : Critical value
Since H1 is a two-sided alternative hypothesis, the critical value at 1% level of
significance is zα/2 = z0.005 = 2.58.
Step 7 : Decision
Since H1 is a two-sided alternative, elements of the critical region are determined by
the rejection rule |z0| ≥ ze. Thus it is a two-tailed test. Since |z0| = 3.79 > ze = 2.58,
reject H0 at 1% level of significance. Therefore, there is significant evidence to
conclude that the preference of tea and coffee are different.
Procedure:
Step 1 : Let PX and PY denote respectively the proportions of Population-1 and Population-2
possessing the qualitative characteristic (attribute) under study. Frame the null
hypothesis as H0: PX=PY and choose the suitable alternative hypothesis from
(i) H1: PX≠ PY (ii) H1: PX>PY (iii) H1: PX<PY
Step 2 : Let p X and pY denote respectively the proportions of the samples of sizes m and n
drawn from Population-1 and Population-2 possessing the attribute, where m and n are
large (i.e., m ≥ 30 and n ≥ 30). Also, mpX � 5, m �1 � pX � � 5, npY � 5 and n �1 � pY � � 5 .
Here, these two samples are assumed to be independent.
Step 3 : Specify the level of significance, α.
( pX � pY ) � (PX � PY )
Step 4 : Consider the test statistic Z� under H0. Here,
pq � 1 � 1 �
�m n�
mp � np � �
p̂ � X Y , q̂ = 1 − p̂. The approximate sampling distribution of the test statistic
m�n
under H0 is the N(0,1) distribution.
pX � pY
Step 5 : Calculate the value of Z for the given data as z 0 �
pq � 1 � 1 � .
�m n�
� �
Step 6 : Choose the critical value, ze, corresponding to α and H1 from the following table
Step 7 : Decide on H0 choosing the suitable rejection rule from the following table
corresponding to H1.
Example 1.13
A study was conducted to investigate the interest of people living in cities towards self-
employment. Among randomly selected 500 persons from City-1, 400 persons were found to be
self-employed. From City-2, 800 persons were selected randomly and among them 600 persons
are self-employed. Do the data indicate that the two cities are significantly different with respect
to prevalence of self-employment among the persons? Choose the level of significance as
α = 0.05.
Solution:
Step1 : Let PX and PY be respectively the proportions of self-employed people in City-1 and
City-2. Then, the null and alternative hypotheses are
Null hypothesis: H 0 : PX = PY
i.e., there is no significant difference between the proportions of self-employed
people in City-1 and City-2.
Alternative hypothesis: H1 : PX ≠ PY
i.e., difference between the proportions of self-employed people in City-1 and City-2
is significant. It is a two-sided alternative hypothesis.
Step 2 : Data
The given sample information are