Z-Values for Normal Distribution Probabilities
Z-Values for Normal Distribution Probabilities
We know that from a population of size N, large number of samples (NCn) each of size n can be
drawn from the same population. A certain statistic (like mean or variance) for each of these samples can be
computed. The distribution of the statistic, obtained from the samples, is called the sampling distribution of the
statistic. For example, if the sample size is 4 and the population size is 6, it is possible to draw 6C4 = 12
samples from the population. We may compute the mean for each sample. The twelve sample means form a
distribution known as the sampling distribution of the mean. Likewise, the distribution of the proportions (or
per cent rates) obtained from all possible samples of the same size drawn from a population is called the
sampling distribution of the proportion.
However, sampling must be based on random selection giving equal chance of selection to each one
of the items in the universe. Only then, the representative character of sampling will be ensured. Another
assumption in this regard is that the population from which the samples are drawn is infinite as in the case of
an abstract universe constructed from the tossing of coin or the throw of dice. But, if the population is finite
comprising say a number of accounts in a ledger or a number of units of a product in a godown, the number
should be fairly large or the selection be made with replacement so that the selection process is independent.
Standard Error
The standard deviation of a sampling distribution of a given statistic is frequently called the standard
error of that statistic. For example the standard deviation of the means of all possible samples of the same
size drawn from a population is called the standard error of the mean. Likewise, the standard deviation of the
proportions of all possible samples of the same size drawn from a population is commonly referred to as the
standard error of proportion. The difference between the terms ‘standard deviation’ and ‘standard error’
is that the former depends upon original values, the latter depends upon the statistic computed from samples
of original values.
1
(i) Point Estimate
A single value of a statistic that is used to approximate a population parameter is called a point
estimate. The statistic that one uses to obtain a point estimate is called an estimator and the value of the
statistic is estimate.
For example, the mean X which we use for estimating the population mean is an estimator of µ.
Similarly, the statistics s2 is an estimator of σ2, where the value of s2 is computed from a random sample.
Different samples will generally lead to different estimates. An estimator is un-biased when the expected
value of the statistic used as the estimator is equal to the value of parameter. The expected value of the
statistic expressed symbolically as E (statistic) is the arithmetic mean of the sampling distribution of the
statistic, since the mean of the sampling distribution of several means is equal to population mean: a sample
mean is an un-biased estimator. In other words is an un-biased estimate of µ. Likewise, since the mean of
the sampling distribution of the proportion is equal to the population proportion, a sample proportion is an un-
biased estimator, or p is an unbiased estimate of P.
However, the mean of the sampling distribution of the variance (s2) is not equal to the population
variance (σ2), or s2 is a biased estimate of σ2.
The value of the variance s2 or σ2 is computed by dividing the sum of the squared deviation from the
mean by n, the sample size or by N, the population size, i.e.,
S2 =
On the other hand, if the sum of the squared deviations from the mean is divided by (n – 1) for a
sample, denoted by s2 or by (N – 1) for a population denoted by σ2 the mean of the sampling distribution of the
modified variance s2 is equal to the modified population variance σ2 or s2 is an unbiased estimate of σ2.
It may be noted that when the population (N) is large or infinite, the factor N/(N – 1) approaches
unity, and s2 approaches σ[Link] s2 is an unbiased estimate of σ2 when the population is large. Similarly,
when the population and sample both are large, the factors n/(n – l) and N/(N – 1) approach 1 and we may
say that s2 is an unbiased estimate of σ2.
(X − X) 2
S2 = and σ2 =
n −1
Also when the population represents the proportion of an event, we may state the bias or un-bias of
a sample variance for a sampling distribution of the proportion as follows:
When both population size and sample size are large, N/(N – l) and n(n – 1) approach 1 and pq
becomes an unbiased estimate of PQ.
2
of the sample is small. Having known the method of computing the standard error of any statistic, the tables of Z-
values are available which can be used for laying down the confidence limit of the interval estimate as follows
:
Upper confidence limit =
Lower confidence limit =
The following table gives some common confidence levels and their Z-values :
Confidence levels 50% 68.27% 90% 95% 98% 99% 99.73%
Z 0.6745 1.00 1.645 1.98 2.33 2.58 3.00
The above values of confidence level are the same as normal distribution area. The confidence
level refers to the probability that any random value drawn from the population will fall within these limits. A
95% confidence level indicates that there is 95% probability that the value of the randomly drawn item lies
within the limits indicated. Alternatively, there is a 5% risk that it will fall outside the specified limits:
Example 1 :
A sample survey of 81 documentaries reveal an average length of 90 minutes and a standard deviation
of 20 minutes. Find the interval estimate of population mean at 90% and 95% confidence.
The point estimate of µ is ± 90%. Since, the sample size is large, the population standard deviation
can be approximated by σ = 20.
X+ σ p20
– ZS.E Level Z Limit Limit
(100 − p)
xp +
90 –±+ 1.64 X
Z1.96
−1.96
n 81n
90%
95%
Thus the interval estimate of mean 86.36 to 93.64 ensures 90% confidence and the interval of 85.64
to 94.36 ensures 95% confidence. In order words, there are only 10 out of 100 chances that the true value
will fall out of the interval in the first case and only 5 out of 100 chances to have the true population mean
outside the interval in the second case.
It may also be noted that a longer interval is required to estimate µ with a higher degree of accuracy.
Example 2.
Out of a sample of 600 people, 40% preferred TOPAZ blade and the remaining 60% preferred a
PANAMA blade. Estimate the population percentage who prefers TOPAZ blade.
Solution : Given : n = 600, p = 40
At the 95% level
40 × 60
Proportion = = 40 ± 1.96
600
3
∴ Proportion limits are 40 ± 3.92 or 36.08% to 43.92%
This has been worked in percentage terms. We can use proportions by using the same data at 99%
confidence level
s
S±Z = = 100 ± 9.8 or 90.2 to 109.8
2n
and confidence limits at 99% level is
Example 4 :
In a sample survey of 1,000 house wives in a city, 23 percent preferred Hawkins pressure cooker.
Find at 99 percent confidence limits the percentage of all housewives in the city preferring that brand of
cooker.
Solution:
Given : p = 23%, n = 1000
To set the limits, we compute the standard error :
S.E.(p) = =1.331
The 99% confidence limits for percentage of housewives preferring the Hawkins cooker are :
p(100 − p)
p ± 2.58 = 23 ±(2.58 × 1.331) = 23 ± 3.434 = 19.57% to 26.43%
n
Example 5 :
Out of 20,000 customer’s ledger accounts, a sample of 600 accounts were taken to test the accuracy
of posting and balancing wherein 45 mistakes were found. Assign limits within which the number of defective
cases can be expected at 95% level.
45
Solution : Given : n = 600, p = × 100 = 7.5%
600
To set the limits, we compute the standard error:
4
SE(p) = = 1.075
The 95% confidence limits for the percentage of defective cases are:
p(100 - p)
p ± 1.96 = 7.5 ± 1.96 × 1.075 = 7.5 ± 2.107 or 5.3939 to 9.607
n
Therefore, the number of defective cases may vary between 20,000 × and 20,000 ×
i.e., between 1079 and 1921.
Example 6 :
A factory is producing 50,000 pairs of shoes daily. From a sample of 500 pairs, 2% were found to be
of sub-standard quality. Estimate the number of pairs that can be reasonably expected to be spoiled in the
daily production and assign limits at 95% level of confidence.
Solution : We calculate standard error first.
S.E.(p) = = 0.6261
The 95% confidence limits for the percentage of defective cases are:
p(100 - p)
p ± 1.96 = 2 ± 1.96 × 0.6261 = 2 ± 1.2271 = 0.773 to 3.227
n
The number of pairs expected to be spoiled in the daily production will be
5.393
9.607
50,000
p (100 − p ) 2 × 98
7.5 × 92.5
= 3.227 × and 0.773 × i.e., between 1631 and 387.
100
100 n 500
600
(ii) Interval Estimate (Small samples):
Earlier treatment was for determining confidence intervals of population mean µ etc. When the size
of the samples was more than 30. But in many business situations this size may be smaller than 30. For laying
down confidence intervals in such cases t values are used in place of Z value. The following table gives the
value for different confidence levels, with reference to degree of freedom (n – 1) where n refers to the size
of the sample.
Confidence Level
Degree of 80% 90% 95% 98% 99%
freedom t t t t t
(d.f.) 0.10 0.050 0.025 0.010 0.05
2 1.886 2.920 4.303 6.965 9.925
10 1.372 1.812 2.228 2.764 3.169
20 1.325 1.725 2.086 2.528 2.845
30 1.310 1.697 2.042 2.457 2.570
∞ 0.282 1.648 1.960 2.326 2.576
It may be noted that t-values are rendered equal to z values when (n – 1) exceeds 30.
Example 7 :
A survey of 17 agricultural labourers reveal an average income of Rs. 40 per week with a standard
5
deviation of Rs. 8.
Find out the limits of average weekly income of the population with a confidence of 95% (t = 2.131
for 15 d.f.)
Solution :
Lower confidence limit:
= 40 – 2.131 × 8 ( )
16 = 40 – 4.26 = 35.74
Upper confidence limit:
= 40 + 2.131 × 8 ( )
16 = 40 + 4.26 = 44.26
Hence, the average weekly income for the population may range between Rs. 35.74 and Rs. 44.26.
Example 8 :
A sample of 6 persons in an office revealed an average daily smoking of 10, 12, 8, 9, 16, 5 cigarettes,
the average level of smoking in the whole office has to be estimated at 90% level of confidence (t = 2.015 for
5 degrees of freedom).
Solution : First of all we calculate the mean ( ) and the standard deviation (S)
= = 10
s =
(
= 10 – 2.015 × 3.74 )
6 = 10 – 3.08 = 6.92
Upper confidence limit :
(
= 10 + 2.015 × 3.74 )
6 = 10 + 3.08 = 13.08
Hence, the average level of smoking in the whole office may range between 6.92 and 13.08.
Note : In the above example has been taken in place of because the bias of s has already
been removed by using the formula:
6
s=
Example 9 :
A random sample of 16 items from a normal population has a mean of 53 and sum of squares of
deviation from mean equals to 150. Can this sample be regarded as taken from the population having 56 as
mean ? Obtain 95% and 99% confidence limits of mean of the population.
You may use the following information from statistical table:
α = 0.05, t = 2.131
v = 15,
α = 0.01, t = 2.947
Solution :
Null Hypothesis, µ = 56
∴ S2 = = 10
Sx = = 0.79
1s(x(x x)2 –2 x)
– 10 2 150
3.162 x − µ 53 − 56
= –(xx)= = t = = = 3.79
n n− 1 16 4 15 Sx 0.79
n −1
Decision. The value of t for (16–1) degrees of freedom at the 0.05 level of significance is ± 2.131.
The calculated value of t = 3.79 is smaller than table value of t = 2.131, or falls in the rejection region. Thus,
we reject the hypothesis that the sample is taken from population with 56 as mean.
(i) 95% confidence limits of the population mean:
(
x + t 0.05 × s n )
or 53 ± 2.131 × (3.162/4)
or 53 ± 1.684
or 51.316 and 54.684
Thus 51.316 < µ < 54.684.
(ii) 99% confidence limits of the population mean:
x + t 0.01 × s( n )
or 53 ± 2.947 × (3.162/4) or 53 ± 2.33 or 50.67 and 55.33
Thus 50.67 < µ < 55.33.
7
The need for the determination of a proper size of the sample is very important for practical use in
business where either the standard error is known on the basis of past experience or where a given level of
accuracy is desired.
If the sample size is too large, more money and time have to be spent but the results may not be
commensurate with it. Also, a valid conclusion may not be reached if the sample size is too small, therefore,
the need for a proper size of the sample. The method of determining a proper size is given in the following two
cases:
µ – σx µ µ+
The normal curve shows the confidence interval
µ ±Z = µ ± E, and E = Z
E=Z× (Where E = , i.e., the difference between sample mean and the
population mean).
Z 2σ 2
n= or
E2
Here Z is set by the level of the confidence interval and σ the value of the population standard
deviation may be actual or estimated from the past experience, or estimated by using s which is the standard
deviation of either a previous sample or a pilot study.
Example 10 :
What should be the size of the sample from a set of 2,000 accounts if the standard deviation of
default as per past experience was 2.6 and a 95% confidence is desired and sample mean should not differ
by more than half from the population mean.
8
Solution : The confidence interval is µ ± 0.5
Thus E = 0.5
At the 95% confidence interval, Z = 1.96
The standard deviation of the population is σ = 2.6, substituting these values in the formula.
n= = 103.84 or 104.
Example 11.
A purchaser receives articles in large batches and he wants to know the sample size he should use in
order to obtain a satisfactory estimate for the mean of each batch. One batch was exhaustively examined
and gave a distribution with standard deviation 2.85. The purchaser considered as satisfactory a knowledge
of the mean with 0.24 of its true value for a probability 0.95.
Solution :
1. The confidence interval is µ ± 0.24
Thus E = 0.24
2. At 95% confidence interval, Z = 1.96
3. The standard deviation is σ = 2.85
4. Substituting the values in the formula, we get
2 2
Zσ 1.96 × 2.85
n= = = (23.75)2 = 541.72
E 0.24
σ(where2
Zp σPQ E =1.96
(p –×P), 2
2.6i.e., theHence
difference between the sample
Zproportion
= p and the population a sample of
proportion P) size 542 will give the required knowledge of the population mean.
En 0.5
(b) Sample size for estimating a population Proportion.
The relationship between the population P and the sampling distribution of the proportion P may be
reviewed by sketching the following normal curve:
P – ZσP P P + ZσP
The normal curve shows the confidence interval:
P ± Z σ p =P ± E where E = Z
Now E=Z =
9
Or =
Here the value of P, or the Population proportion may be estimated from the past experience.
Example 12 :
The sales manager of a large manufacturing company wants to check the inventory records against
the physical inventories by sampling study. He indicates that (i) the maximum sampling error should not be
more than 10% above or below the true proportion of the inaccurate record (ii) the level of the confidence
interval be 95% and (iii) the proportion of the inaccurate records is estimated at 25% according to the past
experience. Find the sample size.
Solution :
The confidence interval is P ± 10% (... E = 0.10)
At the 95% confidence interval, Z = 1.96.
The estimated population proportion P = 25% = 0.25 and Q = 1 – p = 0.75
Substituting the values in formula, we get
Z2 PQ (1.96) 2 (0.25)(0.75)
n= = = 72
E2 (0.10) 2
Hypothesis Testing
A sampling investigation produces a result which has to be compared with the one expected, on the
basis of population parameter. Due to the laws of chance, it is possible that any particular sample may
produce a result which is out of accord with the one expected. Before acting hastily we must try to determine
whether the difference is significant and cannot arise merely because of sampling. We must therefore sub-
ject our judgement to an appropriate test for significance in order to see that the disparity between the
observed and expected result arose solely due to sampling. The procedure for carrying out a significance test
i s
explained below :
1. Setting up Hypothesis:
In hypothesis testing, we must state the hypothesized value of the population parameter before we
begin sampling. The assumption we wish to test is called the null hypothesis denoted by H0.
Suppose we want to test the hypothesis that the population mean is equal to 120. We would write it
as follows, “The null hypothesis is that the population mean is equal to 120”.
H0 : µ = 120
If we use a hypothesized value of a population mean in a problem, we can write it as
µ H0
10
If our sample results fail to support the null hypothesis, we reject it and accept the alternative
hypothesis and is symbolized by H1.
H0 : µ = 120 (The null hypothesis is that the population mean is equal to 120)
We can consider three possible alternative hypothesis:
H1 : µ = 120 (The alternative hypothesis is that the population mean is not equal to 120)
H1 : µ < 120 (The alternative hypothesis is that the population mean is less than 120)
H1 : µ > 120 (The alternative hypothesis is that the population mean is greater than 120)
The purpose of hypothesis testing is not to question the computed value of the sample statistic but
to make a judgement about the difference between the sample statistic and a hypothesized population
parameter.
µ H0 –1.96 µ H0 +1.96
11
(a) Significance
value. But in shaded area , welevel of 0.01
would reject the null hypothesis. Our significance level of 0.50 is so high that
we would rarely accept the null hypothesis when it is not true but, at the same time, often reject it when it is
true.
Figure 2
3. Type I and Type II Errors
Rejecting a null hypothesis when it is true is called a Type I error, and its probability is symbolized by
α (alpha). Alternatively, accepting a null hypothesis when it is false is called a Type II error, and its probability
is symbolized by β (beta). There is a trade-off between these two errors. The probability of making one type
of error can be reduced only if we are willing to increase the probability of making the other type of error.
Notice in part c, Figure 2 that our acceptance region is quite small (0.50 of the area under the curve). With a
small acceptance region, we will rarely accept a null hypothesis when it is not true, rather we will often reject
a null hypothesis when it is not true. In order to get a low β, we will have to put up with a high α. To deal with
this trade-off in personal and professional situations, decision makers decide the appropriate level of significance
by examining the costs attached to both types of errors.
Suppose that making a Type I error (rejecting a null hypothesis when it is true) involves the time and
trouble of reworking a batch of chemicals that should have been accepted. At the same time, making a Type
12
II error (accepting a null hypothesis when it is false) means taking a chance that an entire group of users of
this chemical compound will be poisoned. Obviously, the management of this company will prefer a Type I
error to a Type II error and, will set very high levels of significance in its testing to get low β.
On the other hand, if making a Type I error involves disassembling an entire engine at the factory, but
making a Type II involves relatively inexpensive warranty repairs by the dealers. Then the manufacture will
prefer a Type II error and will set lower significance levels in its testing.
Acception region
µ H0
13
H0 : µ = 850. Because he does not want to deviate significantly from 850 hours in either direction, the
appropriate alternative hypothesis is H1 : µ ≠ 850, and he uses a two-tailed test. Thus he rejects the null
hypothesis if the mean life of bulbs in the sample is either above 850 hours or below 850 hours.
However, there are situations in which a two-tailed test is not appropriate, and we must use a one-
tailed test. Consider the case of a wholesaler who buys bulbs from manufactures. The wholesaler buys bulbs
in large lots and does not want to accept a lot of bulbs unless their mean life is at least 850 hours. As each
shipment arrives, the wholesaler tests a sample to decide whether he should accept the shipment. The
wholesaler will reject the shipment only if he feels that the mean life is below 850 hours. If he feels that the
bulbs are better than expected. (with a mean life above 850 hours), he will not reject the shipment because
the longer life comes at no extra cost. So wholesaler’s hypothesis are H0 : µ = 850 hours and H1 : µ < 850
hours. It rejects H0 only if the mean life of the sampled bulbs is significantly below 850 hours. From the figure
4, we can see why this test is called a left-tailed test (one-tail test).
If the sample mean falls in this region, we would accept the null hypothesis
850 hours
Then we standardize the sample proportion by dividing the difference between the observed sample
proportion. , and the hypothesized propotion, PH0, by the standard error of the proportion.
z=
By marking the calculated standardized sample proportion, –3.268, on a graph of the sampling
distributioin, it is be seen that this falls outside the region of acceptance, as shown in figure 1.
Figure 1
Therefore, the president should reject the null hypothesis and conclude that there is a significant
difference between the director of human resources hypothesized proportion of promotable employees
0.75 and the observed proportion of promotable employees in the sample. From this, he should infer that
the true proportion of promotable in the company is not 75 percent.
16
One-Tailed Tests of Proportions
A one-tailed test of a proportion is similar to one-tailed test of a mean.
Example 2:
A member of a public interest group related with pollution makes a statment at a public meeting that
“fewer than 55 percent of the industrial plants in this area are complying with air-pollution standards.” An
officer of pollution protection angency believes that 55 percent of the plants are complying with the
standards; he decides to test hypothesis at the 0.02 significance level.
HO : p = 0.55 (Null hypothesis: The proportion of plants complying with the air-pollution stan-
dards is 0.55)
H1 : p < 0.55 (Alternative hypothesis: The proportion complying with the air-pollution standards
is less than 0.55)
α = 0.2 (Level of significance)
The officer makes a thorough checking of the records in his office. He takes a sample of 60 plants
from a population of over 10,000 plants and finds that 30 are complying with air-pollution standards. Is the
statement made by the member of the public interest group valid or not?
= 0.55 (Population propotion that is complying with air-pollution standards)
= 0.45 (Population propotion that is not complying with air-pollution standards)
n = 60 (Sample size)
=30/60, or 0.50 (Sample proportion complying)
= 30/60, or 0.50 (Sample proportion polluting)
This is a one-tailed test. The officer is interested to find whether the actual proportion is less than
0.55. In order to reject the null hypothesis that the true proportion of plants in compliance is 55 percent, the
representative must accept the alternative hypothesis that fewer than 0.55 have complied. In Figure 2 we
have shown this hypothesis test graphically.
Figure 2
Because np and nq each are more than over 5, we can use the normal approximation of the
binomial distribution. The critical value of z from statistical Table for 0.48 of the area under the curve
is 2.05.
Next we can calculate the standard error of the proportion using the hypothesized population propor-
tion are follows:
We standardize the sample proportion by dividing the difference between the observed sample
proportion, , and the hypothesized proportion, , by the standard error of the proportion.
Figure 3 illustrates where the sample proportion lies in relation to the critical value, –2.05. Looking at this
figure, we can see that the sample proportion lies within the acceptance region.
Figure 3
Therefore, the officer of pollution protection agency should accpet the null hypothesis that the true
proportion of complying plans is 0.55. Although the observed sample proportion is below 0.55, it is
not far below 0.55 to make us accept the statement by the member of the public interest group.
Example 1:
17
A ketchup manufacturer is in the process of deciding whether to produce a new extra-spicy brand.
The company’s marketing-research department used a national telephone survey of 6500 households and
found that the extra-spicy ketchup would be purchased by 350 of them. A much more extensive study made
2 years ago showed that 5 percent of the households would purchase the brand then. At 2 percent signifiance
level, should the company conclude that there is an increased demand for the extra-spicy flavor?
Answer: n = 6500, = 350 / 6500 = 0.05384
H0 = p = 0.05, H1 : p > 0.05, α = 0.02
The upper limit of the acceptance region is z = 2.05,
z value =
Figure 4
Because 1.1497 (z) > 2.05 we should accept H0. The current interest is not greater than the interest
of 2 years ago.
Example 2:
A company sells lawn mowers in his hardware store, and is interested in comparing the reliablity of
the mowers it sells with the reliability of mowers sold by other companies. The company knows that only 15
percent of all mowers sold require repairs during the first year of ownership. A sample of 120 of customers
revealed that exactly 22 of them required mower repairs in the first year of ownership. At the 0.05 level of
significance, is there evidence that company’s mowers differ in reliability from those sold by other comapnies?
Answer:
Figure 5
18
0.05 level whether there is difference between the efficacies of these two drugs.
Given: = 0.69 (Sample proportion of successes with drug 1)
= 0.31 (Sample proportion of failures with drug 1)
n 1 = 110 (Sample size for testing drug 1)
= 0.667 (Sample proportion of successes with drug 2)
= 0.333 (Sample proportion of failures with drug 2)
n 2 = 90 (Sample size for testing drug 2)
H0 : p1 = p 2 (Null hypothesis: There is no difference between these drugs)
H1 : p1 ≠ p 2 (Alternative hypothesis: There is a difference between them)
α = 0.05 (Level of significance)
Because the management of the pharmaceutical company wants to know whether there is a differ-
ence between the two commpounds, this is a two-tailed test. The significance level of 0.05 corresponds to
the shaded regions in the figure. Both samples are large enough to justify using the normal distribution to
approximate the binomial. From statistical Table, we can determine that the critical value of z for 0.475 of
the area under the curve is 1.96.
Figure 6
We can begin by calculating the standard error of the sampling distribution we are using in our
hypothesis test. In this example, the binomial distribution is the correct sampling distribution.
To test the two compounds, we do not know the population parameters, p1, p2, q1 and q2, and thus
we need to estimate them from the sample statistics and . In this case we will use the Estimated Standard
Error of the difference between two proportions:
If we hypothesize that there is no difference between the two population proportions, then our best
estimate of the overall population proportion of successes is probably the combined proportion of suc-
cesses in both samples, that is:
∴ Estimated Standard Error of the difference between two Proportions using combined Estimates
from both Samples:
19
Figure 7
We standardize the difference between the two observed sample proportions, , by dividing with the
estimated standard error of the difference between two proportions:
We can see in figure 7 that the standardized difference between the two sample proportions lies
within the acceptance region. Thus, we accept the null hypothesis and conclude that these two new
compounds produce effects on blood pressure that are not significantly different from each other.
One-Tailed Test for Difference between Proportions
Example 2:
Delhi Government has been using two methods of listing property for tax puposes. The first requires
the property owner to appear in person before a tax lister, but the second permits the property owner to mail
in a tax form. The city manager thinks the personal-appearance method produces far fewer mistakes than
the mail-in-method. She authorizes an examination of 100 personal-appearance listings and 150 mail-in list-
ings. Ten per cent of the personal-appearance forms contain errors: 13.3 percent of the mail-in forms contain
effors. The result of the sample is summarized below:
Given: = 0.10 (Proportion of personal-appearnace forms with errors)
= 0.90 (Proportion of personal-appearnace forms without errors)
n 1 = 100 (Sample size of personal-appearance forms)
= 0.133 (Proportion of mail-in forms with errors)
= 0.867 (Proportion of mail-in forms without errors)
n 2 = 150 (Sample size of mail-in forms)
The House tax officer wants to test at the 0.10 level of significance the hypothesis that the personal
appearance method produces a lower proportion of errors.
H0 : p1 = p2 (Null hypothesis: There is no difference between the two methods)
H1 : p1 < p2 (Alternative hypothesis: The personal-appearance method has a lower
proportion of errors than the mail-in method)
α = 0.10 (Level of significance for testing the hypothesis)
With samples of this size, we can use the standard normal distribution and statistical table to deter-
mine the critical value of z for 0.40 of the area under the curve (0.50 – 0.10). We can use this value, 1.28,
as the boundary of the acceptance region.
Because the House tax officer wishes to test whether the personal-appeance listing is better than the
mailed-in listing, the appropriate test is a one-tailed test. It is a left tailed test, because to reject the null
hypothesis, the test result must fall in the shaded portion of the left tail, indicating that significantly fewer
errors exist in the personal-appearance forms.
To estimate the standard error of the difference between two proportions, we first use the
combind proportions from both samples to estimate the overall proportion of successes:
Figure 8
Now can be used to calculate the estimated standard error of the difference between the two
proportions by using the following equation:
We use the estimated standard error of the difference, , to convert the observed difference between
the two proportions, , to a standardized value:
z=
Figure 9
20
This figure 9 shows us that the standardized difference between the two sample proportions lies well
within the acceptance region, and the city manager should accept the null hypothesis that there is no
difference between the two methods of tax listing. Therefore, if mailed-in listing is considerably less
expensive to the Government, then the House tax officer should consider increasing the use of this
method.
Example 3:
A large hotel chain is trying to decide whether to convert more of its rooms to nonsmoking rooms. In
a random sample of 400 guests last year, 166 had requested non-smoking rooms. This year, 205 guests in a
sample of 380 preferred the non-smoking rooms. Would you recommend that the hotel chain convert more
rooms to non-smoking ? Support your recommendation by testing the appropriate hypotheses at a 0.01 level
of significance.
Answer:
Figure 10
Because – 3.48(z) < – 2.33 we reject H0. Therefore, the hotel chain should convert more rooms to
non-smoking because there was a significant increase in the proportion of guests requesting these rooms
during last year.
Example 4:
Two different areas of a large eastern city are being considered as sites for day-care centres of 200
households surveyed in one secton, the proportion in which the mother worked full-time was 0.52. In another
section, 40 percent of the 150 households surveyed had mothers working at fulltime jobs. At the 0.04 level of
significance, is there a significant difference in the proportions of working mothers in the two areas of the
city?
Answer: = 0.40
H0 : p1 = p2, H1 : p1 ≠ p2 , α = 0.04
Figure 11
Because – 2.23 > – 2.05, we reject H0. The proportions of working mothers in the two areas differ
significantly.
21
LESSON 3
→ ←
Figure 1
and = 80,000 – 2.58 (342.94)
= 80,000 – 884.78 = 79,115.22 pound (Lower limit)
Note that we have defined the limits of the acceptance region (80,884.78 and 79,115.22) and the
sample mean (79,500) and illustrated them in figure 2 in the scale of the original variable (pounds per
square inch).
Figure 2
22
The sample mean lies within the acceptance region; the manufacturer should accept the null hypoth-
esis because there is no significant difference between the hypothesized mean of 80,000 and the sample
mean of 79,500.
We use equation to standardize the sample means , by subtracting (the hypothesized mean) and
dividing by (the standard error of the mean).
z=
Figure 3
Figure 4
Placing the standardized value of the z scale shows that this sample mean falls well outside the
acceptance region, as shown in figure 4.
Therefore, the hospital should reject the null hypothesis because the observed mean of the sample is
23
significantly lower than our hypothesized mean of 100cc. On the basis of this sample of 100 doeses, the
hospital should conclude that the doses in the shipment are not sufficient.
Example 3:
Hinton Press hypothesizes that the average life of its largest web press is 14,500 hours. They know
that the standard deviation of press life is 2,100 hours. From a sample of 50 presses, the company finds a
sample mean of 13,000 hours. At a 0.01 signficance level, should the company conclude that the average life
of the presses is less than the hypothesized 14,500 hours?
Solution:
Figure 5
Because –4.2257 (x) <–2.33, we should reject H0. The average life is significantly less than the
hypothesized value.
Example 4:
Bombay Theaters know that a certain hit movie ran an average of 84 days in each city, and the
corresponding standard deviation was 10 days. The manager was interested in comparing the movie’s popu-
larity in this region with other regions. He randomly chose 75 theaters in his region and found that they ran the
movie an average of 81.5 days.
(a) State appropriate hypotheses for testing whether there was a significant difference in the length
of the picture’s run between theaters in his region and other regions.
(b) At a 1 per cent significance level, test these hypotheses.
Solution:
Figure 6
The limits of the acceptance region are
Because –2.17(z) is in the acceptance region, we accept H0. Therefore we conclude the popularity
of hit movie is not significanlty different in other regions and they are equally popular in all regions.
Test for Difference between Large sample Means
Example 5:
When both sample sizes are greater than 30, this exmaple given below illustrated how to do a two-
tailed test of a hypothesis about the difference between two means. A manpower-development statistician is
asked to determine whether the hourly wages of semi-skilled workers are the same in two cities. The result
of this survey is presented below:
Mean Hourly Standard Sample
City Earnings Deviation size
Data from a A 8.95 0.40 200
Sample Survey B 9.10 0.60 175
Hourly wages
24
Figure 7
Suppose the company wants to test the hypothesis at the 0.05 level of significance that there is no
difference between hourly wages for semi-skilled workers in the two cities:
= (Null hypothesis: there is no difference)
= (Alternative hypothesis: there is a difference)
= (Level of significance)
Because the company is interested to know whether the means are equal or not, this is a two-tailed
test.
In figure 8, the significance level of 0.05 corresponds to the two shaded areas, each of which
contains 0.025 of the area. The acceptance region contains two equal areas of 0.475 each. Because both
samples are large, we can use the normal distribution. From the statistical table, we can determine the
critical value of z for 0.475 of the area under the curve to be 1.96.
The standard deviations of the two populations are not known. Therefore, our first step is to estimate
them:
Now the estimated standard error of the difference between the two means can be determined by
Figure 8
We standardize the difference of sample means, . First, we subtract the hypothesized difference of
the population means. Then, we divide by the stimated standard error of the difference between the
sample mean.
z=
We mark the standardized difference on a graph of the sampling distribution and compare with the
critical value, as shown in figure 8. Figure 8 exhibits that the standardized difference between the two
sample means lies outside the acceptance region. Thus we reject the null hypothesis of no difference and
conclude that the population means (the average semi-skilled wages in these two cities) differ.
In most of the examples, we will be testing whether two population have the same means. Because
of this, , the hypothesized difference between the two means, was zero. However, we can also investigate
whether the average wages are ten paise per hour lower in city A than in B. In such a case our hypoth-
eses would have been:
(Null hypothesis : wages are 0.10 lower in A than in B)
(Alternative hypothesis : wages are not 0.10 lower in A than in B)
In this case, the hypothesized difference between the two means would be = – 0.10, and the stan-
dardized difference between the sample means would be
25
H1 : µ 1 ≠ µ 2 – 0.10 (Alternative hypothesis : wages are more than 0.10 lower in A than in B)
Example 6:
Two independent samples of observations were collected. For the first sample of 50 elements, the
mean was 85 and the standard deviation 5. Then second sample of 100 elements had a mean of 80 and a
standard deviation of 10.
(a) Compute the estimated standard error of the difference between the two means.
(b) Using α = 0.01, test whether the two samples can reasonably be considered to have come from
population with the same mean.
Answer : s1 = 5 n1 = 50 = 85 s2 = 10 n2 = 100 = 80
(a)
(b)
The limits of the acceptance region are z = ± 2.58
z value
Figure 9
Becaue 4.08 (z) > 2.58.
We reject H0. It is reasonable to conclude that the two samples come from different populations.
Example 7:
In 2000 the Financial Accounting Standards Board (FASB) was considering a proposal to require
companies to report the potential effect of employees stock options on earnings per share (EPS). A
random sample of 40 high-technology firms revealed that the new proposal would reduce EPS by an
average of
14 per cent, with a standard deviation of 18 percent. A random sample of 35 producers of consumer goods
showed that the proposal would reduce EPS by 9 percent on average, with a standard deviation of
8.5 per cent. On the basis of these samples, is it reasonable to conclude (at α = 0.10) that the FASB
proposal will cause a greater reduction in EPS for high-technology firms than for producers of consumer
goods?
Answer: Sample 1 (HT firms):
Sample 2 (CG producers):
%
The upper limit of the acceptance region is z = 1.28
z value
Figure 10
Because 1.568 (z) > 1.28, we will reject H0. It is reasonable to believe that FASB proposal will cause
higher reduction in EPS of high technology firms than consumer goods firms.
26
LESSON 4
When the sample sizes are small (n < 30), there are two changes in our procedure for testing the
differences between means. The first involves the way we compute the estimated standard error of the
difference between the two sample means. Secondly, we will base our small-sample tests on the distribu-
tion in place of the normal distribution.
Example 1:
If a company is investigating two education programs for increasing the sensitivity of its managers.
The original program consisted of several informal question-and-answer sessions with leaders. Over the past
few years, a program involving formal classroom contact with professional psychologists and socilogists has
been developed. The new program is considerably more expensive, and the president wants to test at the
0.05 level of significance whether this expenditure has resulted in greater sensitivity. We can test the follow-
ing:
(Null hypothesis : There is no difference in sensitivity levels of two programs)
(Alternative hypothesis: The new program results in higher sensitvity levels)
α = 0.05 (Level of significance)
Table contains the data resulting from a sample of the managers trained in both programs. Because
only limited data are available for the two programes, the population standard deviations are estimated
from the data. The sensitivity level is measured as a percentage on a standard psychometric scale.
Number of Managers Mean Sensitivity after Standard Deviation Sensitivity
Program Observed this Program after this Program
Formal 12 90% 15%
Informal 15 85% 20%
The company wishes to test whether the senstivity achieved by the new program is significantly
higher than that achieved under the older program. To reject the null hypothesis, the observed difference
of sample means would need to fall sufficiently high in the right tail of the distribution. Then we would
accept the alternative hypothesis that the new program leads to higher senstivity levels and that the extra
expenditures on this program are justified.
The second step for hypothesis testing requires us to choose the appropriate distribution and find the
critical value. To compute the critical value.
We need to calculate the standard error of the difference between the two means. Because the
population standard deviations are not known, we must use the following equation:
Where the sample sizes were large (both greater than 30), we used other equation and estimated by
, and by . Now, with small sample sizes, that procedure is not appropriate. If we cannot assume that the
unknown population variances are equal, we can continue.
Assuming that how can we estimate the common variance but we don’t use all the information
avaliable to us because we ignore one of the samples. Instead we use a weighted average of and and the
weights are the numbers of degrees of freedom in each samples. This weighted average is called a
“pooled estimate” of and given by:
27
Because we have to use the sample variances to estimate the unknown , the test will be based on the
t distribution. This is just like the test of a single mean from a sample of size n when we didn’t know .
There we used at t distribution with n – 1 degree of freedom, because once we knew the sample means
only n – 1 of the sample observation could be freely specified. Here we have n1 – 1 degrees of freedom
in the first sample and n2 – 1 degree of freedom in the second sample, so when we pool them to estimate ,
we wind up with n1 + n2 + 2 degrees of freedom., Hence the appropriate sampling distribution for our test
of the two sensitivity programs is the t distribution with 12 + 15 – 2 = 25 degrees of freedom. Because we
are doing an upper-tailed test at a 0.05 significance level, the critical value of t is 1.708.
Now that we have the critical value for our hypothesis test, we can illustrate the same graphically in
figure 1. The shaded region at the right of the distribution represents the 0.05 significance level of our test.
Figure 1
Continuing, we pool the formula for from earlier equation and simplify to get an equation for the
estimated standard error of with small samples and equal population variances:
Next we standardize the difference of sample means, First, we subtract the hypothesized difference
of the population means. Then we divided by , the estimated standard error of the difference between the
sample means.
= 0.7189
Because our test of hypothesis is based on the t distribution, we use t to denote the standardied
statistic.
Figure 2
Then we mark the standardized difference on a graph of the sampling distribution and compare it
with the critical value of t = 1.708. We can see in figure 2 that the standardized difference between the
two sample means lies within the acceptance region. Thus, we accept the null hypothesis that there is no
difference between the sensitivities achieved by the two programs. The company’s expenditures on the
formal instructional program have not produced significantly higher senstivities among its managers.
Example 2: The Dean of students of a Deaprtment, is wondering about grade distributions at University
schools. He has heard grumblings that the GPAs in the Business School are about 0.25 lower than those in
the College of Arts and Sciences. A random sample produced the following GPAs.
Business: 2.86 2.77 3.18 2.80 3.14 2.87 3.19 3.24 2.91 3.00 2.83
Arts & Sciences: 3.35 3.32 3.36 3.63 3.41 3.37 3.45 3.43 3.44 3.17 3.26 3.18 3.41
Do these data indicate that there is a factual basis for the student’s grumblings? State and test
appropriate hypothesis at α = 0.02.
Answer: Sample 1 (Business) : SB = 0.176 nB = 11
Sample 2 (Art & Science): SA = 0.121 nA = 13
α = 0.02
28
t value
Figure 3
Because – 2.268 > – 2.508, we accept H0.
The Business School GPAs are about 0.25 below those in the College of Arts and Sciences.
Example 3:
A consumer-research organization routinely selects several car models each year and evaluates
their fuel efficiency. In this year’s study of two similar subcompact models from two different automakers,
the average gas mileage for 12 cars of brand A was 27.2 miles per litre, and the standard deviation was 3.8
mpl. The nine brand B cars that were tested averaged 32.1 mpl and the standard deviation was 4.3 mpl. At
α = 0.01, should it conclude that brand A cars have lower average mileage than brand B cars ?
Answer: SA = 3.8nA = 12 SB = 4.3 nB = 9
H0 : µ A = µ B H1 : µ A < µ B α = 0.01
mpg
The lower limit of the acceptance region is t = – 2.539,
t value – 2.766
Figure 4
Because – 2.266 < – 2.539, we accept H0. Brand B cars give significantly higher mileage than brand
A cars.
Testing Differences between Means with Dependent Samples
In the previous Section, our samples were chosen independently of each other. In the sensitivity
example, sample of managers were taken who had gone through two different training programs. Some-
times, it makes sense to take samples that are not independent of each other. Often the use of such
dependent or paired samples enables us to perform a more precise analysis, because they will allow us
to control for extraenous factors. We follow the same procedure of hypothesis testing and the only differ-
ence is that we use a different formula for the estimated standard error of the sample differences and that
we will require that both samples be of the same size.
Example 4:
A health specialist has advertised a weight-reducing program and has claimed that the average
participant in the program loses more than 17 kgs. A somewhat overweight exceutive is interested in the
program but is skeptical about the claims and asks for more evidences. The specialist allows him to select
randomly the records of 10 participants and record their weights before and after the program. Here we have
two sample (a before sample and an after sample) that are clearly dependent on each other, because the
same ten people have been observed twice.
Table – 1
Weights
Before 189 202 220 207 194 177 193 202 208 233
After 170 179 203 192 172 161 174 187 186 204
The overweight executive wants to test at 5 per cent significance level the claimed average weight
loss of more than 17 Kgs. We may state this problem :
29
(Null hypothesis : average weight loss is only 17 kgs)
H1 : µ 1 – µ 2 > 17 (Alternative hypothesis : average weight loss exceeds 17 kgs)
α = 0.05 (Level of significance)
What we are really interested in is not the weights before and after, but their difference. Conceptu-
ally, what we have is not two samples of before and after weights, but one sample of weight losses. If the
population of weight losses has a mean , we can restate our hypothesis as
Figure 5 illustrates this problem graphically. Becuase we want to know whether the mean weight loss
exceeds 17 kilograms an upper-tailed test is appropriate. The 0.05 significance level is shown in figure 3
as the shaded area under the t distribution. We use the t distribution because the sample size is only 10; the
appropriate number of degrees of freedom is 9, (10 – 1). Table gives the critical value of t = 1.833.
Figure 5
We begin by computing the individual losses, their mean, and standard deviation, and proceed exactly
as we did when testing hypotheses about a single mean. The computations are done below:
Table – 2
Loss Loss Squared
Before After x x2
Finding the 189 170 19 361
Mean Weight 202 179 23 529
Loss and 220 203 17 289
Standard 207 192 15 225
Deviation 194 172 22 484
177 161 16 256
193 174 19 361
202 187 15 225
208 186 22 484
233 204 29 841
Next we standardize the observed average weight loss, kilograms by subtrating , the hypothesized
average loss, and dividing by the estimated standard error of the mean.
Because our test of hypotheses is based on the t distribution, we use t to denote the standardized
statistic.
Figure 6 shows the location of the sample mean weight loss on the standardized scale. We see that
the sample mean lies outside the acceptance region, so the executive can reject the null hypothesis and
conclude that the claimed weight loss in the program is legitimate.
30
Figure 6
Let’s see how this paired difference test differs from a test of the difference of means of two
independent samples. Suppose that the data in table 2 represent two independent samples of 10 individu-
als entering the program and another 10 randomly selected individual leaving the program. The means and
variances of the two samples are given in table 3.
Table – 3
Sample Size Mean Variance
Before 10 202.5 253.61
After 10 182.8 201.96
Because the sample sizes are small, we use equation to get a pooled estimate of and equation to
estimate
The appropriate test is now based on the t distribution with 18 degrees of freedom (10 + 10 – 2).
With a significance level of 0.05, the critical value of t from statistical table is 1.734. The observed differ-
ence of the sample means is
kilograms
Now when we standardize the difference of the sample means for this independent - samples test,
we get
Once again, because our test of hypotheses is based on t distribution, we use t to denote the stan-
dardized statistic. Comparting the standardized difference of the sample means (0.40) with the critical
value of t, (1.734), we see that the standardized sample statistic no longer falls, outside the acceptance
region, so this test will not reject H0.
Why did these two tests give such different result ? In the paired sample test, the sample standard
deviation of the individual differences was relatively samll, so 19.7 kilograms was significantly larger than
the hypothesized weight loss of 17 kilograms. With independent samples, however, the estimated standard
deviation of the difference between the means depended on the standard deviation of the before weights
and the after weights. Because both of these were relatively large, was also large, and thus 19.7 was not
significantly larger than 17. The paired sample test controlled this initial and final variability in weights by
looking only at the individual changes in weights. Because of this, it was better to detect the significance of
the weight loss.
We conclude here with the help two examples explained below when to treat two samples of equal
size as dependent or independent :
1. An agricultural extension service wishes to determine whether a new hybrid seed corn has a
greater yeild than an old standard variety. If the service asks 10 farmers to record the yield of an acre
planted with the new variety and asks another 10 farmers to record the yield of an acre planted with the
old variety, the two samples are independent. However, if it asks 10 farmers to plant one acre with each
variety and record the results, then the samples are dependent, and the paired difference test is appropri-
ate. In the latter case differnces due to fertilizer, insecticide, rainfall are controlled, because each farmer
treats his two acres identically. Thus, any differences in yield can be attributed solely to the variety
planted.
2. The director of the secretarial pool at a large legal office wants to determine whether typing speed
depends on the word-processing software used by a secretary. If she tests seven secretaries using
Picosoft Write and seven using Write Perfect, she should treat her samples as independent. If she tests the
31
same seven secretaries twice (once on each word processor), then the two samples are dependent. In the
paired difference test, difference among the secretaries are eliminated as a contributing factor, and the
differnces in typing speeds can be attributed to the different word processors.
Example 5:
Mr. X is a quality control engineer with the windshield wiper manufacturing company. The company
considering two new synthetic rubbers for its wiper blades, and Mr. X is given the responsibilty to verify
whether blades made with two new compounds wear equally well. He tried on 12 cars belonging to other
employees with one blade made of each of the two compounds. On cars 1 to 6, the right blade was made of
compound A and the left blade was made of compound B; on cars 7 to 12, compound A was used for the left
blade. The cars were driven under normal operating conditions until the blades did an unsatisfactory job of
cleaning the windshield of rain. The data below gives the usable life (in days) of the blades. At α = 0.05, do
the two compounds were equally well ?
Car 1 2 3 4 5 6 7 8 9 10 11 12
Left Blade 162 323 220 274 165 271 233 156 238 211 241 154
Right Blade 183 347 247 269 189 257 224 178 263 199 263 148
Answer: Let us refer Right Blade by A and Left Blade by B.
X
Car Blade A Blade B Difference X2
1 183 162 21 441
2 347 323 24 576
3 247 220 27 729
4 269 274 –5 25
5 189 165 24 576
6 257 271 –14 196
7 233 224 9 81
8 156 178 –22 484
9 238 263 –25 625
10 211 199 12 144
11 241 263 –22 484
12 154 148 6 36
days
days
days
Figure 7
The limits of the acceptance region are
t value =
Because 0.511 < 2.201, we accept H0. The two compounds are not significantly different with
respect to usable life.
Example 6:
Nine computer-components dealers in major metropolitan areas were asked for their prices on two
similar color inkjet printers. The results are given below. At α = 0.05, is it reasonable to assert that, on
32
average, the Apson printer is less expensive than the HP printer ?
Dealer 1 2 3 4 5 6 7 8 9
Apson price 250 319 285 260 305 295 289 309 275
HP price 270 325 269 275 289 285 295 325 300
Answer : Let HP price are represented by µ0 and apson price µ1.
Dealar Apson price HP price Difference (X) X2
1 250 270 20 400
2 319 325 6 256
3 285 269 –16 256
4 260 275 15 225
5 305 289 –16 256
6 295 285 –10 100
7 289 295 6 36
8 309 325 16 256
9 275 300 25 625
= 5.21
Figure 8
Because 0.981 < 1.860, we except H0. On average, the Apson inkjet printer is not significantly less
expensive than the HP inkjet printer.
33