0% found this document useful (0 votes)
3 views25 pages

FALLSEM2020-21 MAT2001 ETH VL2020210107492 Reference Material I 18-Oct-2020 M

The document discusses the concepts of random sampling, statistics, and sampling distributions, emphasizing the importance of independent and identically distributed (iid) random variables. It explains the definitions of null and alternative hypotheses in statistical testing, along with the potential for type I and type II errors in decision-making. Additionally, it provides examples and formulas related to standard errors and statistical measures, highlighting their applications in various fields.

Uploaded by

Lakshit Mangla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views25 pages

FALLSEM2020-21 MAT2001 ETH VL2020210107492 Reference Material I 18-Oct-2020 M

The document discusses the concepts of random sampling, statistics, and sampling distributions, emphasizing the importance of independent and identically distributed (iid) random variables. It explains the definitions of null and alternative hypotheses in statistical testing, along with the potential for type I and type II errors in decision-making. Additionally, it provides examples and formulas related to standard errors and statistical measures, highlighting their applications in various fields.

Uploaded by

Lakshit Mangla
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 25

Random sample: Any set of reliazations (X1, X2 , ...

, Xn) made on X under independent and


identical conditions is called a random sample.

Statistic: Any statistical quantity calculated on the basis of the random sample is called
a statistic. The sample mean, sample standard deviation, sample proportion etc., are called
statistics (plural form of statistic).They will be denoted by Roman letters.

Let (x1, x2, …, xn) be an observed value of (X1, X2, ..., Xn). The collection of (x1, x2, …, xn)
is known as sample space, which will be denoted by ‘S’.

Note 1:
A set of n sample observations can be made on X, NOTE
say, x1, x2, …, xn for making inferences on the unknown
The statistic itself is a random
parameters. It is to be noted that these n values may vary variable and has a probability
from sample to sample. Thus, these values can be considered distribution.
as the realizations of the random variables X1, X2, ..., Xn,
which are assumed to be independent and have the same distribution as that of X. These are also
called independently and identically distributed (iid) random variables.

Note 2:
1 n
In Statistical Inference, the sample standard deviation is defined as S �
n � 1
� ( Xi � X )2 ,
1 n i �1
where X � � Xi . It may be noted that the divisor is n – 1 instead of n.
n i �1
Note 3:

The statistic itself is a random variable, until the numerical values of X1, X2, ..., Xn are observed,
and hence it has a probability distribution.
Notations to denote various population parameters and their corresponding sample
statistics are listed in Table 1.1. The notations will be used in the first four chapters of this book
with the same meaning for the sake of uniformity.
Table 1.1 Notations for Parameters and Statistics

Value of the Statistic for a


Statistical measure Parameter Statistic
given sample
Mean μ X x
Standard deviation σ S s
Proportion P p p0

1.2 SAMPLING DISTRIBUTION


The probability distribution of a statistic is called sampling distribution of the statistic.
In other words, it is the probability distribution of possible values of the statistic, whose values
are computed from possible random samples of same size.
The following example will help to understand this concept.
Example 1.1
Suppose that a population consists of 4 elements such as 4, 8, 12 and 16. These may be
considered as the values of a random variable, say, X. Let a random sample of size 2 be drawn
from this population under sampling with replacement scheme. Then, the possible number of
samples is 42.
It is to be noted that, if we take samples of size n each from a finite population of size N, then
the number of samples will be Nn under with replacement scheme and NCn samples under without
replacement scheme.
In each of the 42 samples, the sample elements x1 and x2 can be considered as the values of
the two iid random variables X1 and X2. The possible samples, which could be drawn from the
above population and their respective means are presented in Table 1.2.
Table 1.2 Possible Samples and their Means

Sample Number Sample elements (x1, x2) Sample Mean x


1 4,4 4
2 4,8 6
3 4,12 8
4 4,16 10
5 8,4 6
6 8,8 8
7 8,12 10
8 8,16 12
9 12,4 8
10 12,8 10
11 12,12 12
12 12,16 14
13 16,4 10
14 16,8 12
15 16,12 14
16 16,16 16

The set of pairs (x1, x2) listed in column 2 constitute the sample space of samples of size 2 each.
Hence, the sample space is:
S = {(4,4), (4,8), (4,12), (4,16), (8,4), (8,8), (8,12), (8,16), (12,4), (12,8), (12,12), (12,16),
(16,4), (16,8), (16,12), (16,16)}
The sampling distribution of X, the sample mean, is determined and is presented in Table 1.3.
Table 1.3 Sampling Distribution of Sample Mean

Sample mean: x 4 6 8 10 12 14 16 Total


– 1 2 3 4 3 2 1
Probability: P(X = x–) 1
16 16 16 16 16 16 16

Note 4: The sample obtained under sampling with replacement from a finite population satisfies
the conditions for a random sample as described earlier.
Note 5: If the sample values are selected under without replacement scheme, independence
property of X1, X2, ... Xn will be violated. Hence it will not be a random sample.
Note 6: When the sample size is greater than or equal to 30, in most of the text books, the sample
is termed as a large sample. Also, the sample of size less than 30 is termed as small sample.
However, in practice, there is no rigidity in this number i.e., 30, and that depends on the nature
of the population and the sample.

Note 7: The learners may recall from XI Standard Textbook that some of the probability distributions
possess the additive property. For example, if X1, X2, ..., Xn are iid N(μ, σ2) random variables, then the
probability distributions of X1 + X2 + ... + Xn and X are respectively the N(nμ, nσ2) and N(μ, σ2/n). These
two distributions, in statistical inference point of view, can be considered respectively as the sampling
distributions of the sample total and sample mean of a random sample drawn from the N(μ, σ2)
distribution. The notation N(μ, σ2) refers to the normal distribution having mean μ and variance σ2.

1.3 STANDARD ERROR


The standard deviation of the sampling distribution of a statistic is defined as the standard
error of the statistic, which is abbreviated as SE.
For example, the standard deviation of the sampling distribution of the sample mean, x–, is
known as the standard error of the sample mean, or SE (X).
If the random variables X1, X2, ..., Xn are independent and have the same distribution with

mean μ and variance σ2, then variance of X becomes as
1 n  1 n 1 n ns 2 s 2
V ( X ) = V  ∑ Xi  = 2 ∑ V ( Xi ) = 2 ∑ s 2 = 2 =
 n i=1  n i=1 n i=1 n n
s
Thus, SE ( X ) =
n.
1 n  1 n 1 n nµ
Also, note that mean of X = E ( X ) = E  ∑ Xi  = ∑ E ( Xi ) = ∑ µ = =µ
 n i=1  n i=1 n i=1 n
Example 1.2
Calculate the standard error of X for the sampling distribution obtained in Example 1.

Solution:
Here, the population is {4, 8, 12, 16}.
Population size (N) = 4, Sample size (n) = 2
Population mean (µ) = (4 + 8 + 12+ 16)/4 = 40/4 = 10
The population variance is calculated as
1 N
�2 � �
N i �1
( Xi � � )2

1� 1
� 4 � 10 � � � 8 � 10 � � �12 � 10 � � �16 � 10 � � �
2 2 2 2
� �36 � 4 � 4 � 36�� � 20.
4� � 4�
s2 20
Hence, SE( X ) = = = 10
n 2
This can also be verified from the sampling distribution of X (see Table 1.3)

V(X ) � � (x � � )2 P ( X � x )

where the summation is taken over all values of x


1 2 3 4
Thus, V(X )  (4  10
x )2  (6  10) 2  (8  10) 2  (10  10) 2
16 16 16 16
3 2 1
(12  10) 2  (14  10) 2  (16  10) 2
16 16 16
1
 (36  32  12  0  12  32  36)  10
16
Hence, the standard deviation of the sampling distribution of X is = 10 .

Standard Errors of some of the frequently referred statistics are listed in Table 1.4.
Table 1.4 Statistics and their Standard Errors

Statistic Standard error


PQ
Sample proportion: p , where P is the population proportion and Q = 1 – P.
n
� X2 � Y2
Difference between � where m and n are the sizes of samples drawn from the
m n
the means X and Y populations whose variances are σ 2 and σ 2 respectively.
X Y
of two independent
1 1
samples: � X � Y � � � , where σ2 is the common variance of the populations.
m n

PX Q X PY QY
+ , where m and n are sizes of the samples drawn from
m n
Difference between the the populations whose proportions are respectively P and P ;
X Y
proportions pX and pY Q = 1 – P , Q = 1 – P .
X X Y Y
of two independent
samples: (pX – pY) pq � 1 � 1 � mpX � npY q
� m n � , where p � , � 1 � p , m and n are sample sizes,
� � m�n
when PX and PY are unknown.
1.4 NULL HYPOTHESIS AND ALTERNATIVE HYPOTHESIS
In many practical studies, as mentioned earlier, it is necessary to make decisions about a
population or its unknown characteristics on the basis of sample observations. For example, in bio-
medical studies, we may be investigating a particular theory that the recently developed medicine
is much better than the conventional medicine in curing a disease. For this purpose, we propose a
statement on the population or the theory. Such statements are called hypotheses.
Thus, a hypothesis can be defined as a statement on the population or the values of the
unknown parameters associated with the respective probability distribution. All the hypotheses
should be tested for their validity using statistical concepts and a representative sample drawn from
the study population. ‘Hypotheses’ is the plural form of ‘hypothesis’.
A statistical test is a procedure governed by certain determined/derived rules, which lead to
take a decision about the null hypothesis for its rejection or otherwise on the basis of sample values.
This process is called statistical hypotheses testing.
The statistical hypotheses testing plays an important role, among others, in various fields
including industry, biological sciences, behavioral sciences and Economics. In each hypotheses testing
problem, we will often find as there are two hypotheses to choose between viz., null hypothesis and
alternative hypothesis.

Null Hypothesis:
A hypothesis which is to be actually tested for possible rejection based on a random sample
is termed as null hypothesis, which will be denoted by H0.
YOU WILL KNOW

(i) Generally, it is a hypothesis of no difference in the case of comparison.


(ii) Assigning a value to the unknown parameter in the case of single sample problems
(iii) Suggesting a suitable model to the given environment in the case of model construction.
(iv) The given two attributes are independent in the case of Chi-square test for independence
of attributes.

Alternative Hypothesis:
A statement about the population, which contradicts the null hypothesis, depending upon
the situation, is called alternative hypothesis, which will be denoted by H1.
For example, if we test whether the population mean has a specified value μ0, then the null
hypothesis would be expressed as:

H 0: μ = μ 0
The alternative hypothesis may be formulated suitably as anyone of the following:

(i) H1: μ ≠ μ0
(ii) H1: μ > μ0
(iii) H1: μ < μ0
The alternative hypothesis in (i) is known as two-sided alternative and the alternative
hypothesis in (ii) is known as one-sided (right) alternative and (iii) is known as one-sided (left)
alternative.

1.5 ERRORS IN STATISTICAL HYPOTHESES TESTING


A statistical decision in a hypotheses testing problem is either of rejecting or not rejecting
H0 based on a given random sample. Statistical decisions are governed by certain rules, developed
by applying a statistical theory, which are known as decision rules. The decision rule leading to
rejection of H0 is called as rejection rule.
Table 1.5 Decision Table
The null hypothesis may be
either true or false, in reality. Under H0 is true H0 is false
this circumstance, there will arise four Reject H Type I error Correct decision
0
possible situations in each hypotheses
Do not Reject Ho Correct decision Type II error
testing or decision making problem
as displayed in Table 1.5.
It must be recognized that the final decision of rejecting H0 or not rejecting H0 may be
incorrect. The error committed by rejecting H0, when H0 is really true, is called type I error. The
error committed by not rejecting H0, when H0 is false, is called type II error.

Example 1.3
A soft drink manufacturing company makes a new kind of soft drink. Daily sales of the new soft
drink, in a city, is assumed to be distributed with mean sales of ₹40,000 and standard deviation of
₹2,500 per day. The Advertising Manager of the company considers placing advertisements in local
TV Channels. He does this on 10 random days and tests to see whether or not sales has increased.
Formulate suitable null and alternative hypotheses. What would be type I and type II errors?

Solution:
The Advertising Manager is testing whether or not sales increased more than ₹40,000.
Let μ be the average amount of sales, if the advertisement does appear.
The null and alternative hypotheses can be framed based on the given information as
follows:
Null hypothesis: Ho: μ = 40000
i.e., The mean sales due to the advertisement is not significantly different from ₹40,000.
Alternative hypothesis: H1: μ > 40000
i.e., Increase in the mean sales due to the advertisement is significant.
(i) If type I error occurs, then it will be concluded as the advertisement has improved
sales. But, really it is not.

(ii) If type II error occurs, then it will be concluded that the advertisement has not
improved the sales. But, really, the advertisement has improved the sales.
The following may be the penalties due to the occurrence of these errors:
If type I error occurs, then the company may spend towards advertisement. It may increase
the expenditure of the company. On the other hand, if type II error occurs, then the company will
not spend towards advertisement. It may not improve the sales of the company.

1.6 LEVEL OF SIGNIFICANCE, CRITICAL REGION


AND CRITICAL VALUE(S)
In a given hypotheses testing problem, the maximum probability with which we would be
willing to tolerate the occurrence of type I error is called level of significance of the test. This
probability is usually denoted by ‘α’. Level of significance is specified before samples are drawn
to test the hypothesis.
The level of significance normally chosen in every hypotheses testing problem is 0.05 (5%)
or 0.01 (1%). If, for example, the level of significance is chosen as 5%, then it means that among
the 100 decisions of rejecting the null hypothesis based on 100 random samples, maximum of 5
of among them would be wrong. It is emphasized that the 100 random samples are drawn under
identical and independent conditions. That is, the null hypothesis H0 is rejected wrongly based on
5% samples when H0 is actually true. We are about 95% confident that we made the right decision
of rejecting H0.
Critical region in a hypotheses testing problem is a subset of the sample space whose
elements lead to rejection of H0. Hence, its elements have the dimension as that of the sample size,
say, n(n > 1). That is,

Critical Region = {x = (x1 , x2 , ..., xn )| H 0 is rejected} .



A subset of the sample space whose elements does not lead to rejection of H0 may be
termed as acceptance region, which is the complement of the critical region. Thus,

S = {Critical Region} U {Acceptance Region}.


Test statistic, a function of statistic(s) and the known value(s) of the underlying parameter(s),
is used to make decision on H0. Consider a hypotheses testing problem, which uses a test statistic
t ( X ) and a constant c for deciding on H0. Suppose that H0 is rejected, when t (x ) > c . It is to be
 
noted here that t ( X ) is a scalar and is of dimension one. Its sampling distribution is a univariate

probability distribution. The values of t ( X ) satisfying the condition t (x ) > c will identify the
 
samples in the sample space, which lead to rejection of H0. It does not mean that �t | t (x ) � c� is

the corresponding critical region. The value ‘c’, distinguishing the elements of the critical region
and the acceptance region, is referred to as critical value. There may be one or many critical
values for a hypotheses testing problem. The critical values are determined from the sampling
distribution of the respective test statistic under H0.
Example 1.4
Suppose an electrical equipment manufacturing industry receives screws in lots, as raw
materials. The production engineer decides to reject a lot when the number of defective screws
is one or more in a randomly selected sample of size 2.

�1, if i th screw is defective


Define Xi � �� , i = 1, 2
�0, if i screw is not defective ,
th

Then, X1 and X2 are iid random variables and they have the Bernoulli (P) distribution.
1 2
Let H 0 : P = and H1 : P =
3 3
The sample space is S = {(0,0),(0,1),(1,0),(1,1)}
If T(X1, X2) represents the number of defective screws, in each random sample, then the
statistic T(X1, X2) = X1 + X2 is a random variable distributed according to the Binomial (2, P)
distribution. The possible values of T(X1, X2) are 0, 1 and 2. The values of T(X1, X2) which lead
to rejection of H0 constitute the set {1,2}.
But, the critical region is defined by the elements of S corresponding to T(X1,X2) = 1 or 2.
Thus, the critical region is {(0,1), (1,0), (1,1)} whose dimension is 2.

Note 8: When the sampling distribution is continuous, the set of values of t ( X ) corresponding to

the rejection rule will be an interval or union of intervals depending on the alternative hypothesis.
It is empahazized that these intervals identify the elements of critical region, but they do not
constitute the critical region.

When the sampling distribution of the test statistic Z is a normal distribution, the critical
values for testing H0 against the possible alternative hypothesis at two different levels of
significance, say 5% and 1% are displayed in Table 1.6.
Table 1.6 Critical values of the Z statistic

Level of Significance (α)


Alternative hypothesis
0.05 or 5% 0.01 or 1%

One- sided ( right ) zα = z0.05 = 1.645 zα = z0.01 = 2.33

One- sided (left ) –zα = –z0.05 = –1.645 –zα = –z0.01 = –2.33

Two-sided zα/2 = z0.025 =1.96 zα/2 = z0.005 = 2.58


1.7 ONE-TAILED AND TWO-TAILED TESTS
In some hypotheses testing problem, elements of the
critical region may be identified by a rejection rule of the type
t ( X ) ≥ c. In this case, P( t ( X ) ≥ c) will be the area, which
 
falls at the right end (Figure1.1) under the curve representing
the sampling distribution of t ( X ) . The statistical test defined c

by this kind of critical region is called right-tailed test. Figure 1.1. Right-tailed Test

On the other hand, suppose that the rejection rule


t ( X ) ≤ c determines the elements of the critical region. Then,

P( t ( X ) ≤ c) will be the area, which falls at the left end

(Figure.1.2) under the curve representing the sampling
distribution of t ( X ) . The statistical test defined by this kind

of critical region is called left -tailed test. c
Figure 1.2 Left-tailed Test
The above two tests are commonly known as one-tailed tests.
Note 9: It should be noted that the sampling distribution of t ( X ) need not be with symmetric

shape always. Sometimes, it may be positively or negatively skewed.

Example 1.5
Suppose a pizza restaurant claims its average pizza delivery time is 30 minutes. But you
believe that the restaurant takes more than 30 minutes. Now, the null and the alternate hypotheses
can be formulated as
H0 : μ = 30 minutes and H1: μ > 30 minutes
Suppose that the decision is taken based on the delivery times of 4 randomly chosen pizza
deliveries of the restaurant. Let X1, X2, X3, and X4 represent the delivery times of the such four
occasions. Also, let H0 be rejected, when the sample mean exceeds 31. Then, the critical region is
� x � x � x3 � x 4 �
Critical Region = �(x1 , x2 , x3 , x 4 ) | x � 1 2 � 31�
� 4 �
In this case, P � X � 31� will be the area, which fall at the right end under the curve representing

the sampling distribution of X. Hence, this test can be categorized as a right-tailed test. 

Suppose that H0 is rejected, when either


t � X � � a or t � X � � b holds. In this case, P(t � X � � a)
 
and P(t � X � � b ) will be the areas, which fall respectively at

left and right ends under the curve representing the sampling
distribution of t ( X ) (Figure 1.3). The statistical test defined

with this kind of rejection rule is known as two-tailed test.
a b
Figure 1.3 Two-tailed Test
Example 1.6
A manufacturer of ball-bearings, which are used in some machines, inspects to see whether
the diameter of each ball-bearing is 5 mm. If the average diameter of ball-bearings is less than
4.75 mm or greater than 5.10 mm, then such ball-bearings will cause damages to the machine.
Here the null and the alternate hypotheses are

H0 : μ = 5 and H1: μ ≠ 5.

Suppose that the decision on H0 is made based on the diameter of 10 randomly selected
ball-bearings. Let Xi, i = 1, 2, …, 10 represent the diameter of the randomly chosen ball bearings.
Then, the critical region is

� x � x � ... � x10 �
Critical Region = �(x1 , x2 ,..., x10 ) | x � 1 2 � 4.75 or � 5.10 �
� 10 �
In this case, P � X � 4.75 � is the area, which will fall at the left end and P � X � 5.10 � is the
area, which will fall at the right end under the curve representing the sampling distribution of X.
This kind of test can be categorized as a two-tailed test (see Figure 1.3).

1.8 GENERAL PROCEDURE FOR TEST OF HYPOTHESES


The following steps constitute a general procedure, which can be followed for solving hypotheses
testing problems based on both large and small samples.

Step 1 : Describe the population and its parameter(s). Frame the null hypothesis (H0) and
alternative hypothesis (H1).
Step 2 : Describe the sample i.e., data.
Step 3 : Specify the desired level of significance, α.
Step 4 : Specify the test statistic and its sampling distribution under H0.
Step 5 : Calculate the value of the test statistic under H0 for given sample.
Step 6 : Find the critical value(s) (table value(s)) from the statistical table generated from the
sampling distribution of the test statistic under H0 corresponding to α.
Step 7 : Decide on rejecting or not rejecting the null hypothesis based on the rejection rule
which compares the calculated value(s) of the test statistic with the table value(s).


Now, let us see some of the large sample tests, which apply the above general procedure.
As mentioned in Note-6, for large samples, the size of the sample is greater than or equal to 30.
In the case of two samples considered for a hypotheses testing problem, the test is a large sample
test, when the sizes of both the samples are greater than or equal to 30.
1.9 TEST OF HYPOTHESES FOR POPULATION MEAN
(Population variance is known)

Procedure:
Step 1 : Let µ and σ2 be respectively the mean and the variance of the population under study,
where σ2 is known. If µ0 is an admissible value of µ, then frame the null hypothesis
as H0: µ = µ0 and choose the suitable alternative hypothesis from
(i) H1: µ ≠ µ0 (ii) H1: µ > µ0 (iii) H1: µ < µ0

Step 2 : Let (X1, X2, …, Xn) be a random sample of n observations drawn from the population,
where n is large (n ≥ 30).
Step 3 : Let the level of significance be α.

X −µ0
Step 4 : Consider the test statistic Z = under H0. Here, X represents the sample
σ/ n
mean, which is defined in Note 2. The approximate sampling distribution of the
test statistic under H0 is the N(0,1) distribution.

Step 5 : Calculate the value of Z for the given sample (x1, x2, ..., xn) as
x � �0
z0 �
�/ n

Step 6 : Find the critical value, ze, corresponding to α and H1 from the following table

Alternative Hypothesis (H1) µ ≠ µ0 µ > µ0 µ < µ0


Critical Value (ze) zα/2 zα –zα

Step 7 : Decide on H0 choosing the suitable rejection rule from the following table
corresponding to H1.

Alternative Hypothesis (H1) µ ≠ µ0 µ > µ0 µ < µ0


Rejection Rule |z0| ≥ zα/2 z0 > zα z0 < –zα

Example 1.7
A company producing LED bulbs finds that mean life span of the population of its bulbs
is 2000 hours with a standard derivation of 150 hours. A sample of 100 bulbs randomly chosen
is found to have the mean life span of 1950 hours. Test, at 5% level of significance, whether the
mean life span of the bulbs is significantly different from 2000 hours.
Solution:
Step 1 : Let μ and σ represent respectively the mean and standard deviation of the probability
distribution of the life span of the bulbs. It is given that σ = 150 hours. The null and
alternative hypotheses are
Null hypothesis: H0: μ = 2000
i.e., the mean life span of the bulbs is not significantly different from 2000 hours.
Alternative hypothesis: H1 : μ ≠ 2000
i.e., the mean life span of the bulbs is significantly different from 2000 hours.
It is a two-sided alternative hypothesis.
Step 2 : Data
The given sample information are
Sample size (n) = 100, Sample mean (x) = 1950 hours
Step 3 : Level of significance
α = 5%
Step 4 : Test statistic
X � �0
The test statistic is Z � , under H0
�/ n
Under the null hypothesis H0, Z follows the N(0,1) distribution.

Step 5 : Calculation of Test Statistic


The value of Z under H0 is calculated from
x � �0
z0 �
�/ n
as

1950 � 2000
z0 �
150 100
= –3.33
Thus; | z 0 | = 3.33
Step 6 : Critical value
Since H1 is a two-sided alternative, the critical value at α = 0.05 is ze = z0.025 = 1.96.
(see Table 1.6).
Step 7 : Decision
Since H1 is a two-sided alternative, elements of the critical region are determined
by the rejection rule |z0| ≥ ze. Thus, it is a two-tailed test. For the given sample
information, the rejection rule holds i.e., |z0| = 3.33 > ze = 1.96. Hence, H0 is rejected
in favour of H1: μ ≠ 2000. Thus, the mean life span of the LED bulbs is significantly
different from 2000 hours.
Example 1.8
The mean breaking strength of cables supplied by a manufacturer is 1900 n/m2 with a
standard deviation of 120 n/m2. The manufacturer introduced a new technique in the manufacturing
process and claimed that the breaking strength of the cables has increased. In order to test the
claim, a sample of 60 cables is tested. It is found that the mean breaking strength of the sampled
cables is 1960 n/m2. Can we support the claim at 1% level of significance?

Solution:
Step 1 : Let μ and σ represent respectively the mean and standard deviation of the probability
distribution of the breaking strength of the cables. It is given that σ = 120 n/m2. The
null and alternative hypotheses are
Null hypothesis H0: μ = 1900
i.e., the mean breaking strength of the cables is not significantly different from
1900n/m2.
Alternative hypothesis: H1: μ > 1900
i.e., the mean breaking strength of the cables is significantly more than 1900n/m2.
It may be noted that it is a one-sided (right) alternative hypothesis.
Step 2 : Data
The given sample information are
Sample size (n) = 60. Hence, it is a large sample.
Sample mean (x)= 1960
Step 3 : Level of significance
α = 1%
Step 4 : Test statistic
X −µ0
The test statistic is Z = , under H0
σ/ n
Since n is large, under the null hypothesis, the sampling distribution of Z is the
N(0,1) distribution.
Step 5 : Calculation of test statistic
x � �0
The value of Z under H0 is calculated from z 0 �
�/ n
1960 � 1900
z0 �
120 / 60
Thus, z0 = 3.87
Step 6 : Critical value
Since H1 is a one-sided (right) alternative hypothesis, the critical value at α = 0.01
level of significance is ze = z0.01= 2.33 (see Table 1.6)
Step 7 : Decision
Since H1 is a one-sided (right) alternative, elements of the critical region are determined
by the rejection rule z0 > ze. Thus, it is a right-tailed test. For the given sample
information, the observed value z0 = 3.87 is greater than the critical value ze = 2.33.
Hence, the null hypothesis H0 is rejected. Therefore, the mean breaking strength of the
cables is significantly more than 1900 n/m2.
Thus, the manufacturer’s claim that the breaking strength of cables has increased by
the new technique is found valid.

1.10 TEST OF HYPOTHESES FOR POPULATION MEAN


(POPULATION VARIANCE IS UNKNOWN)

Procedure:
Step1 : Let µ and σ2 be respectively the mean and the variance of the population under study,
where σ2 is unknown. If µ0 is an admissible value of µ, then frame the null hypothesis
as H0: µ = µ0 and choose the suitable alternative hypothesis from

(i) H1: µ ≠ µ0 (ii) H1: µ > µ0 (iii) H1: µ < µ0


Step 2 : Let (X1, X2, …, Xn) be a random sample of n observations drawn from the population,
where n is large (n ≥ 30).
Step 3 : Specify the level of significance, α.
X � �0
Step 4 : Consider the test statistic Z � under H0, where X and S are the sample
S/ n
mean and sample standard deviation respectively. It may be noted that the above
test statistic is obtained from Z considered in the test described in Section 1.9 by
substituting S for σ.

The approximate sampling distribution of the test statistic under H0 is the N(0,1)
distribution.
YOU WILL KNOW

It is important to note that the exact sampling distribution of Z is the Student’s ‘t’
distribution with (n – 1) degrees of freedom, when n is small (n < 30). This hypotheses testing
problem, when n is small, is discussed, in detail, in Chapter 2. When n is large, the Student’s
‘t’ distribution converges to the N(0,1) distribution.

x � �0
Step 5 : Calculate the value of Z for the given sample (x1, x2, ..., xn) as z 0 � . Here, x
s/ n
and s are respectively the values of X and S calculated for the given sample.

Step 6 : Find the critical value, ze, corresponding to α and H1 from the following table

Alternative Hypothesis (H1) µ ≠ µ0 µ > µ0 µ < µ0


Critical Value (ze) zα/2 zα -zα
Step 7 : Decide on H0 choosing the suitable rejection rule from the following table
corresponding to H1.

Alternative Hypothesis (H1) µ ≠ µ0 µ > µ0 µ < µ0


Rejection Rule |z0| ≥ zα/2 z0 > zα z0 < -zα

Example 1.9
A motor vehicle manufacturing company desires to introduce a new model motor vehicle.
The company claims that the mean fuel consumption of its new model vehicle is lower than that
of the existing model of the motor vehicle, which is 27 kms/litre. A sample of 100 vehicles of the
new model vehicle is selected randomly and their fuel consumptions are observed. It is found that
the mean fuel consumption of the 100 new model motor vehicles is 30 kms/litre with a standard
deviation of 3 kms/litre. Test the claim of the company at 5% level of significance.

Solution:
Step 1 : Let the fuel consumption of the new model motor vehicle be assumed to be distributed
according to a distribution with mean and standard deviation respectively μ and σ.
The null and alternative hypotheses are
Null hypothesis H0: μ = 27
i.e., the average fuel consumption of the company’s new model motor vehicle is not
significantly different from that of the existing model.
Alternative hypothesis H1: μ > 27
i.e., the average fuel consumption of the company’s new model motor vehicle is
significantly lower than that of the existing model. In other words, the number of
kms by the new model motor vehicle is significantly more than that of the existing
model motor vehicle.
Step 2 : Data:
The given sample information are
Size of the sample (n) = 100. Hence, it is a large sample.
Sample mean ( x )= 30
Sample standard deviation(s) = 3
Step 3 : Level of significance
α = 5%
Step 4 : Test statistic
The test statistic under H0 is
X � �0
Z� .
S n
Since n is large, the sampling distribution of Z under H0 is the N(0,1) distribution.
Step 5 : Calculation of Test Statistic
The value of Z for the given sample information is calculated from
x � �0
z0 � as
s/ n
30 � 27
z0 �
3 100
Thus, z0 = 10.
Step 6 : Critical Value
Since H1 is a one-sided (right) alternative hypothesis, the critical value at α = 0.05 is
ze = z0.05 = 1.645.

Step 7 : Decision
Since H1 is a one-sided (right) alternative, elements of the critical region are defined by
the rejection rule z0 > ze = z0.05. Thus, it is a right-tailed test. Since, for the given sample
information, z0 = 10 > ze = 1.645, H0 is rejected.

1.11 TEST OF HYPOTHESES FOR EQUALITY OF MEANS OF TWO


POPULATIONS (Population variances are known)

Procedure:
Step-1 : Let µX and s X2 be respectively the mean and the variance of Population -1. Also, let
µY and sY2 be respectively the mean and the variance of Population -2 under study.
Here s X2 and sY2 are known admissible values.
Frame the null hypothesis as H0: µX = µY and choose the suitable alternative hypothesis
from
(i) H1: µX ≠ µY (ii) H1: µX> µY (iii) H1: µX< µY

Step 2 : Let (X1, X2, …, Xm) be a random sample of m observations drawn from Population-1 and
(Y1, Y2, …, Yn) be a random sample of n observations drawn from Population-2, where
m and n are large(i.e., m ≥ 30 and n ≥ 30). Further, these two samples are assumed to be
independent.

Step 3 : Specify the level of significance, α.

( X � Y ) � (� X � �Y )
Z�
Step 4 : Consider the test statistic � X2 � Y2 under H0, where X and Y are

m n
respectively the means of the two samples described in Step-2.
( X −Y )
The approximate sampling distribution of the test statistic Z = under H0
(i.e., µX = µY) is the N(0,1) distribution. s X2 sY2
+
m n

( X −Y )
It may be noted that the test statistic, when s X2 = sY2 = σ2, is Z = .
1 1
s +
m n
(x − y )
Step 5 : Calculate the value of Z for the given samples (x1, x2, ..., xm) and (y1, y2 , …, yn) as z o = .
s X2 sY2
+
m n
Here, x and y are respectively the values of X and Y for the given samples.
Step 6 : Find the critical value, ze, corresponding to α and H1 from the following table

Alternative Hypothesis (H1) µX ≠ µY µX > µY µX < µY


Critical Value (ze) zα/2 zα –zα

Step 7 : Make decision on H0 choosing the suitable rejection rule from the following table
corresponding to H1.

Alternative Hypothesis (H1) µX ≠ µY µX > µY µX < µY


Rejection Rule |z0| ≥ zα/2 z0 > zα z0 < –zα

Example 1.10
Performance of students of X Standard in a national level talent search examination was studied.
The scores secured by randomly selected students from two districts, viz., D1 and D2 of a State were
analyzed. The number of students randomly selected from D1 and D2 are respectively 500 and 800.
Average scores secured by the students selected from D1 and D2 are respectively 58 and 57. Can the
samples be regarded as drawn from the identical populations having common standard deviation 2?
Test at 5% level of significance.

Solution:
Step 1 : Let μX and μY be respectively the mean scores secured in the national level talent
search examination by all the students from the districts D1 and D2 considered for the
study. It is given that the populations of the scores of the students of these districts
have the common standard deviation σ = 2. The null and alternative hypotheses are
Null hypothesis: H0: µX = µY
i.e., average scores secured by the students from the study districts are not significantly
different.
Alternative hypothesis: H1: µX ≠ µY
i.e., average scores secured by the students from the study districts are significantly
different. It is a two-sided alternative.
Step 2 : Data
The given sample information are
Size of the Sample-1 (m) = 500
Size of the Sample-2 (n) = 800. Hence, both the samples are large.
Mean of Sample-1 ( x ) = 58
Mean of Sample-2 ( y ) = 57
Step 3 : Level of significance
α = 5%
Step 4 : Test statistic
The test statistic under the null hypothesis H0 is
X �Y
Z� .
1 1
� �
m n
Since both m and n are large, the sampling distribution of Z under H0 is the N(0, 1)
distribution.
Step 5 : Calculation of Test Statistic
The value of Z is calculated for the given sample information from
x�y
z0 � as
1 1
� �
m n

58 � 57
z0 �
1 1
2 �
500 800

z0 = 8.77
Step-6 : Critical value
Since H1 is a two-sided alternative hypothesis, the critical value at α = 0.05 is
ze = z0.025 = 1.96.
Step-7 : Decision
Since H1 is a two-sided alternative, elements of the critical region are defined by the
rejection rule |z0| ≥ ze = z0.025. For the given sample information, |z0| = 8.77 > ze = 1.96.
It indicates that the given sample contains sufficient evidence to reject H0. Thus,
it may be decided that H0 is rejected. Therefore, the average performance of the
students in the districts D1 and D2 in the national level talent search examination
are significantly different. Thus the given samples are not drawn from identical
populations.
1.12 TEST OF HYPOTHESES FOR EQUALITY OF MEANS OF TWO
POPULATIONS (POPULATION VARIANCES ARE UNKNOWN)

Procedure:
Step-1 : Let µX and s X2 be respectively the mean and the variance of Population -1. Also, let
µY and sY2 be respectively the mean and the variance of Population -2 under study.
Here s X2 and sY2 are assumed to be unknown.

Frame the null hypothesis as H0: µX = µY and choose the suitable alternative hypothesis
from
(i) H1: µX ≠ µY (ii) H1: µX> µY (iii) H1: µX< µY

Step 2 : Let (X1, X2, …, Xm) be a random sample of m observations drawn from Population-1
and (Y1, Y2, …, Yn) be a random sample of n observations drawn from Population-2,
where m and n are large (m ≥30 and n ≥30). Here, these two samples are assumed to
be independent.

Step 3 : Specify the level of significance, α.

Step 4 : Consider the test statistic


( X − Y ) − (µ X − µY )
Z= under H0 (i.e., µX = µY).
S X2 S Y2
+
m n
i.e., the above test statistic is obtained from Z considered in the test described in
Section 1.11 by substituting SX2 and SY2 respectively for s X2 and sY2
X �Y
The approximate sampling distribution of the test statistic Z � under H0
SX2 SY2
is the N(0,1) distribution. �
m n

Step 5 : Calculate the value of Z for the given samples (x1, x2, ...,xm) and (y1, y2, …, yn) as

x�y
z0 �
s 2X sY2 .

m n

Here x and y are respectively the values of X and Y for the given samples.
Also, sX2 and sY2 are respectively the values of SX2 and SY2 for the given samples.

Step 6 : Find the critical value, ze, corresponding to α and H1 from the following table

Alternative Hypothesis (H1) µX ≠ µY µX > µY µX < µY


Critical Value (ze) zα/2 zα –zα
Step 7 : Make decision on H0 choosing the suitable rejection rule from the following table
corresponding to H1.
Alternative Hypothesis (H1) µX ≠ µY µX > µY µX < µY
Rejection Rule |z0| ≥ zα/2 z0 > zα z0 < -zα

Example 1.11
A Model Examination was conducted to XII Standard students in the subject of Statistics.
A District Educational Officer wanted to analyze the Gender-wise performance of the students
using the marks secured by randomly selected boys and girls. Sample measures were calculated
and the details are presented below:

Gender Sample Size Sample Mean Sample Standard deviation


Boys 100 50 4
Girls 150 51 5

Test, at 5% level of significance, whether performance of the students differ significantly


with respect to their gender.

Solution:
Step 1 : Let μX and μY denote respectively the average marks secured by boys and girls in
the Model Examination conducted to the XII Standard students in the subject of
Statistics. Then, the null and the alternative hypotheses are
Null hypothesis: H0: µµXX = µµYY
i.e., there is no significant difference in the performance of the students with respect to
their gender.
Alternative hypothesis: H1 : µ X ≠ µY
i.e., performance of the students differ significantly with the respect to the gender. It is
a two-sided alternative hypothesis.
Step 2 : Data
The given sample information are

Gender of the
Sample Size Sample Mean Sample Standard Deviation
Students
Boys m = 100 x = 50 sX = 4
Girls n = 150 y = 51 sY = 5

Since m ≥ 30 and n ≥ 30, both the samples are large.


Step 3 : Level of significance
α = 5%
Step 4 : Test statistic
The test statistic under H0 is

X �Y
Z� .
SX2 SY2

m n
The sampling distribution of Z under H0 is the N(0,1) distribution.

Step 5 : Calculation of the Test Statistic


The value of Z is calculated for the given sample informations from

x�y as
z0 �
s 2X sY2

m n

50 � 51
z0 �
42 52

100 150
Thus, z0 = −1.75
Step 6 : Critical value
Since H1 is a two-sided alternative, the critical value at 5% level of significance is
ze = z0.025 = 1.96.
Step 7 : Decision
Since H1 is a two-sided alternative, elements of the critical region are determined by
the rejection rule z0 ≥ z0 . Thus it is a two-tailed test. But, z0 = 1.75 is less than
the critical value ze = 1.96. Hence, it may inferred as the given sample information
does not provide sufficient evidence to reject H0. Therefore, it may be decided that
there is no sufficient evidence in the given sample to conclude that performance of
boys and girls in the Model Examination conducted in the subject of Statistics differ
significantly.

1.13 TEST OF HYPOTHESES FOR POPULATION PROPORTION

Procedure:
Step 1 : Let P denote the proportion of the population possessing the qualitative characteristic
(attribute) under study. If p0 is an admissible value of P, then frame the null hypothesis
as H0:P = p0 and choose the suitable alternative hypothesis from

(i) H1: P ≠ p0 (ii) H1: P > p0 (iii) H1: P < p0

Step 2 : Let p be proportion of the sample observations possessing the attribute, where n is
large, np > 5 and n(1 – p) > 5.
Step 3 : Specify the level of significance, α.
p�P
Step 4 : Consider the test statistic Z � under H0. Here, Q = 1 – P.
PQ
n
The approximate sampling distribution of the test statistic under H0 is the N(0,1)
distribution.

p � p0 ,
Step 5 : Calculate the value of Z under H0 for the given data as z 0 � q 0 = 1 – p 0.
p0q0
n
Step 6 : Choose the critical value, ze, corresponding to α and H1 from the following table
Alternative Hypothesis (H1) P ≠ p0 P > p0 P < p0
Critical Value (ze) zα/2 zα -zα

Step 7 : Make decision on H0 choosing the suitable rejection rule from the following table
corresponding to H1.

Alternative Hypothesis (H1) P ≠ p0 P > p0 P < p0


Rejection Rule |z0| ≥ zα/2 z0 > zα z0 < -zα

Example 1.12
A survey was conducted among the citizens of a city to study their preference towards
consumption of tea and coffee. Among 1000 randomly selected persons, it is found that 560 are tea-
drinkers and the remaining are coffee-drinkers. Can we conclude at 1% level of significance from
this information that both tea and coffee are equally preferred among the citizens in the city?
Solution:
Step 1 : Let P denote the proportion of people in the city who preferred to consume tea.
Then, the null and the alternative hypotheses are
Null hypothesis: H 0 : P = 0.5
i.e., it is significant that both tea and coffee are preferred equally in the city.
Alternative hypothesis: H1 : P ≠ 0.5
i.e., preference of tea and coffee are not significantly equal. It is a two-sided alternative
hypothesis.
Step 2 : Data
The given sample information are
Sample size (n) = 1000. Hence, it is a large sample.
No. of tea-drinkers = 560
560
Sample proportion (p) = = 0.56
1000
Step 3 : Level of significance
α = 1%
Step 4 : Test statistic
Since n is large, np = 560 > 5 and n(1 – p) = 440 > 5, the test statistic under the null
p�P
hypothesis, is Z � .
PQ
n
Its sampling distribution under H0 is the N(0,1) distribution.

Step 5 : Calculation of Test Statistic


The value of Z can be calculated for the sample information from
p � p0
z0 � as
p0q0
n
0.56 � 0.50
z0 �
0. 5 � 0.5
1000

Thus, z0 = 3.79
Step 6 : Critical value
Since H1 is a two-sided alternative hypothesis, the critical value at 1% level of
significance is zα/2 = z0.005 = 2.58.
Step 7 : Decision
Since H1 is a two-sided alternative, elements of the critical region are determined by
the rejection rule |z0| ≥ ze. Thus it is a two-tailed test. Since |z0| = 3.79 > ze = 2.58,
reject H0 at 1% level of significance. Therefore, there is significant evidence to
conclude that the preference of tea and coffee are different.

1.14 TEST OF HYPOTHESES FOR EQUALITY OF PROPORTIONS


OF TWO POPULATIONS

Procedure:
Step 1 : Let PX and PY denote respectively the proportions of Population-1 and Population-2
possessing the qualitative characteristic (attribute) under study. Frame the null
hypothesis as H0: PX=PY and choose the suitable alternative hypothesis from
(i) H1: PX≠ PY (ii) H1: PX>PY (iii) H1: PX<PY

Step 2 : Let p X and pY denote respectively the proportions of the samples of sizes m and n
drawn from Population-1 and Population-2 possessing the attribute, where m and n are
large (i.e., m ≥ 30 and n ≥ 30). Also, mpX � 5, m �1 � pX � � 5, npY � 5 and n �1 � pY � � 5 .
Here, these two samples are assumed to be independent.
Step 3 : Specify the level of significance, α.
( pX � pY ) � (PX � PY )
Step 4 : Consider the test statistic Z� under H0. Here,
pq � 1 � 1 �
�m n�
mp � np � �
p̂ � X Y , q̂ = 1 − p̂. The approximate sampling distribution of the test statistic
m�n
under H0 is the N(0,1) distribution.

pX � pY
Step 5 : Calculate the value of Z for the given data as z 0 �
pq � 1 � 1 � .
�m n�
� �

Step 6 : Choose the critical value, ze, corresponding to α and H1 from the following table

Alternative Hypothesis (H1) PX ≠ PY PX > PY PX < PY


Critical Value (ze) zα/2 zα –zα

Step 7 : Decide on H0 choosing the suitable rejection rule from the following table
corresponding to H1.

Alternative Hypothesis (H1) PX ≠ PY PX > PY PX < PY


Rejection Rule |z0| ≥ zα/2 z0 > zα z0 < –zα

Example 1.13
A study was conducted to investigate the interest of people living in cities towards self-
employment. Among randomly selected 500 persons from City-1, 400 persons were found to be
self-employed. From City-2, 800 persons were selected randomly and among them 600 persons
are self-employed. Do the data indicate that the two cities are significantly different with respect
to prevalence of self-employment among the persons? Choose the level of significance as
α = 0.05.
Solution:
Step1 : Let PX and PY be respectively the proportions of self-employed people in City-1 and
City-2. Then, the null and alternative hypotheses are
Null hypothesis: H 0 : PX = PY
i.e., there is no significant difference between the proportions of self-employed
people in City-1 and City-2.
Alternative hypothesis: H1 : PX ≠ PY
i.e., difference between the proportions of self-employed people in City-1 and City-2
is significant. It is a two-sided alternative hypothesis.
Step 2 : Data
The given sample information are

City Sample Size Sample Proportion


400
City-1 m = 500 pX = = 0.80
500
600
City-2 n = 800 pY = = 0.75
800
Here, m ≥ 30, n ≥ 30, mpX = 400 > 5, m(1− pX) = 100 > 5, npY = 600 > 5 and n(1− pY)
= 200 > 5.
Step 3 : Level of significance
α = 5%
Step 4 : Test statistic
The test statistic under the null hypothesis is
pX � pY mpX � npY
Z� where p � and q � 1 � p
pq � 1 � 1 � m � n
�m n�
� �
The sampling distribution of Z under H0 is the N(0,1) distribution.
Step 5 : Calculation of Test Statistic
The value of Z for given sample information is calculated from
pX � pY
z0 � .
pq � 1 � 1 �
�m n�
� �

400 � 600 1000 


Now, p � � � 0.77 and q = 0.23
500 � 800 1300
0.80 � 0.75
Thus, z 0 �
1 1
� 0.77 �� 0.23� �� 500 � 800 ��
� �
z0 = 2.0764
Step 6 : Critical value
Since H1 is a two-sided alternative hypothesis, the critical value at 5% level of
significance is ze = 1.96.
Step 7 : Decision
Since H0 is a two-sided alternative, elements of the critical region are determined by the
rejection rule |z0| > ze. Thus, it is a two-tailed test. For the given sample information, ze
= 2.0764 > ze = 1.96. Hence, H0 is rejected. We can conclude that the difference between
the proportions of self-employed people in City-1 and City-2 is significant.


You might also like