Workbook.hypothesis testing.solutions
Workbook.hypothesis testing.solutions
Solution:
We’re interested in finding out if the new pain reliever has a better success
rate than the current one. Since we’re given a percentage of success, we’ll
be using a population proportion p, instead of a population mean μ. And
since we’re looking at how much better the pain reliever will perform, we
use the > symbol in our alternative hypothesis, which means the null
hypothesis has to have the ≤ symbol.
H0 : p ≤ 0.85
Ha : p > 0.85
2. A research study on people who quit smoking wants to show that the
average number of attempts to quit before a smoker is successful is less
than 3.5 attempts. How should they set up their hypothesis statements?
1
Solution:
We’re interested in finding out if the mean number of attempts is less than
3.5, so we’ll be using a population mean μ. And since we’re looking at
whether the mean is less than 3.5, we use the < symbol in our alternative
hypothesis, which means the null hypothesis has to have the ≥ symbol.
H0 : μ ≥ 3.5
Ha : μ < 3.5
3. A factory creates a small metal cylindrical part that later becomes part
of a car engine. Because of variations in the process of manufacturing, the
diameters are not always identical. The machine was calibrated to create
cylinders with an average diameter of 1/16 of an inch. During a periodic
inspection, it became clear that further investigation was needed to
determine whether or not the machine responsible for making the part
needed recalibration. Write statistical hypothesis statements.
Solution:
The factory wants the mean diameter of the parts it produces to match
the diameter that they need, 1/16 of an inch. That means this is an example
of a statistical hypothesis statement that uses the population mean.
2
Both parts that are too small or too large could create problems, which
means the alternative hypothesis needs to have a ≠ sign. Which means the
null hypothesis will include an = sign.
1
H0 : μ =
16
1
Ha : μ ≠
16
Solution:
The claim of the marketing study is that creating the clothing line that
focuses on lime green and polka dots will increase sales by over 17 % .
Which means the alternative hypothesis would need to include the > sign,
and therefore that the null hypothesis has to include a ≤ sign.
H0 : μ ≤ 0.17
Ha : μ > 0.17
3
5. A food company wants to ensure that less than 0.0001 % of its product
is contaminated. Which hypothesis statements will it write if it wants to
test for this?
Solution:
H0 : μ ≥ 0.0001 %
Ha : μ < 0.0001 %
Solution:
4
population proportion p. And since we’re looking at whether the
proportion is greater than 0.75, we use the > symbol in our alternative
hypothesis, which means the null hypothesis has to have the ≤ symbol.
H0 : μ ≤ 0.75
Ha : μ > 0.75
5
SIGNIFICANCE LEVEL AND TYPE I AND II ERRORS
Solution:
The only way to reduce both the Type I error rate and Type II error rate
simultaneously is to increase the sample size. Therefore, if it’s important
that we reduce error rate as low as possible, we should take the largest
possible sample.
Solution:
The power of a statistical test is the probability that we’ll reject the null
hypothesis when it’s false (make that particular correct choice).
6
H0 is true H0 is false
Power = 1 − β
Power = 1 − 0.05
Power = 0.95
The power of the statistical test, given that the probability of making a
Type II error is 5 % , is Power = 95 % .
Solution:
7
The golfer’s null and alternative hypotheses are
H0 : p ≤ 0.75
Ha : p > 0.75
In reality, his null hypothesis is true, but based on the sample proportion
p̂ = 0.92, he may be in danger of rejecting the null when he shouldn’t.
H0 is true H0 is false
Type I error
Reject H0 CORRECT
P(Type I error)=alpha
Type II error
Accept H0 CORRECT
P(Type II error)=beta
Solution:
8
H0 : μ ≥ 15
Ha : μ < 15
In reality, they’re null hypothesis is false, but based on the sample mean
x̄ = 16, they may be in danger of accepting the null when they shouldn’t.
H0 is true H0 is false
Type I error
Reject H0 CORRECT
P(Type I error)=alpha
Type II error
Accept H0 CORRECT
P(Type II error)=beta
Solution:
9
H0 : p ≥ 0.7
Ha : p < 0.7
In reality, his null hypothesis is false, but based on the sample proportion
p̂ = 0.72, he may be in danger of accepting the null when he shouldn’t.
H0 is true H0 is false
Type I error
Reject H0 CORRECT
P(Type I error)=alpha
Type II error
Accept H0 CORRECT
P(Type II error)=beta
6. A coffee shop owner believes that he sells 500 cups of coffee each
day, on average, and he wants to test this assumption. The truth is, he
actually sells fewer than 500 cups each day. He takes a random sample of
10 days and records the number of cups he sells each of those days. What
kind of error is the coffee shop owner in danger of making?
Day 1 2 3 4 5 6 7 8 9 10
Cups
488 502 496 506 492 489 510 511 506 500
sold
Solution:
10
The owner’s null and alternative hypotheses are
H0 : μ = 500
Ha : μ ≠ 500
If we look at the data, we can see that the sample mean is x̄ = 500. In
reality, his null hypothesis is false, but based on the sample mean x̄ = 500,
he may be in danger of accepting the null when he shouldn’t.
H0 is true H0 is false
Type I error
Reject H0 CORRECT
P(Type I error)=alpha
Type II error
Accept H0 CORRECT
P(Type II error)=beta
Which means the coffee shop owner is in danger of making a Type II error.
11
TEST STATISTICS FOR ONE- AND TWO-TAILED TESTS
1. A local high school states that its students perform much better than
average on a state exam. The average score for all high school students in
the state is 106 points. A sample of 256 students at this particular school
had an average test score of 129 points with a sample standard deviation
of 26.8. Choose and calculate the appropriate test statistic.
Solution:
The sample size is large enough at 256 high schoolers that we can assume
the distribution is approximately normal. In this case, we use a t-test
statistic.
x̄ − μ0 129 − 106
t= s = 26.8
≈ 13.73
n 256
12
the restaurant and finds they contain an average of 1,250 calories per meal
with a sample standard deviation of 350.2. Choose and calculate the
appropriate test statistic.
Solution:
The sample size is large enough at 35 meals that we can assume the
distribution is approximately normal. In this case, we use a t-test statistic.
x̄ − μ0 1,250 − 1,500
t= s = 350.2
≈ − 4.22
n 35
3. In a recent survey, 567 out of a 768 randomly selected dog owners said
they used a kennel that was run by their veterinary office to board their
dogs while they were away on vacation. The study would like to make a
conclusion that the majority (more than 50 % ) of dog owners use a kennel
run by their veterinary office when the owners go on vacation. Choose and
calculate the appropriate test statistic.
Solution:
13
The sample size is large enough at 768 randomly selected individuals that
we can state the distribution is approximately normal. We can show this by
using the checks for the population proportion, np ≥ 10 and n(1 − p) ≥ 10.
The sample size is n = 768 and the sample proportion is
567
p̂ = ≈ 0.738
768
Therefore,
n p̂ = (768)(0.738) ≈ 567 ≥ 10
So we can say that the test statistic will be the z-test statistic for a
population proportion.
p̂ − p0 0.738 − 0.5
z= = ≈ 13.21
p0(1 − p0) 0.5(1 − 0.5)
n 768
14
3. State the type of test: upper-tailed, lower-tailed, or two-tailed.
Solution:
H0 : p ≥ 0.5
Ha : p < 0.5
n p̂ = (500)(0.486) = 243 ≥ 10
Because both values are greater than 10, the distribution is approximately
normal. This is a lower-tailed test because the alternative hypothesis uses
the < sign.
p̂ − p0 0.486 − 0.50
z= = ≈ − 0.6261
p0(1 − p0) 0.5(1 − 0.5)
n 500
15
5. The highest allowable amount of bromate in drinking water is
0.0100 mg/L2. A survey of a city’s water quality took 50 water samples in
random locations around the city and found an average of 0.0102 mg/L2 of
bromate with a sample standard deviation of 0.0025 mg/L. The survey
committee is interested in testing if the amount of bromate found in the
water samples is higher than the allowable amount at a statistically
significant level.
Solution:
H0 : μ ≤ 0.0100
Ha : μ > 0.0100
16
This is a population mean with an unknown population standard deviation,
so we’ll calculate a t-test statistic with the population mean formula.
x̄ − μ0 0.0102 − 0.0100
t= s = 0.0025
≈ 0.5657
n 50
Solution:
H0 : μ = 38.60
Ha : μ ≠ 38.60
17
The sample size is a simple random sample of 60 of her day-old chicks so
we can say the distribution is approximately normal. This is a two-tailed
test because the alternative hypothesis uses the ≠ sign.
x̄ − μ0 39.10 − 38.60
z= σ = 5.7
≈ 0.6795
n 60
18
THE P-VALUE AND REJECTING THE NULL
Solution:
H0 : μ ≥ 230
Ha : μ < 230
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
-2.9 .0019 .0018 .0018 .0017 .0016 .0016 .0015 .0015 .0014 .0014
-2.8 .0026 .0025 .0024 .0023 .0023 .0022 .0021 .0021 .0020 .0019
-2.7 .0035 .0034 .0033 .0032 .0031 .0030 .0029 .0028 .0027 .0026
19
Because this is a lower-tail test, the p-value is just this value we found,
p = 0.0026. Comparing this to α = 0.01, we can see that p ≤ α, which means
we’ll reject the null hypothesis.
Therefore, at a significance level of α = 0.01, the trial can conclude that the
new medicine reduces cholesterol. Because the p-value we found is even
more significant, the trial could go even further, stating that the new
medicine reduces cholesterol at a significance level of p = 0.0026.
Solution:
H0 : μ = 283.6
Ha : μ ≠ 283.6
20
Because the alternative hypothesis uses a ≠ sign, this is a two-tailed test.
We were told in the problem that the test statistic is z = − 1.60, so we’ll look
that up in the z-table.
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
-1.7 .0446 .0436 .0427 .0418 .0409 .0401 .0392 .0384 .0375 .0367
-1.6 .0548 .0537 .0526 .0516 .0505 .0495 .0485 .0475 .0465 .0455
-1.5 .0668 .0655 .0643 .0630 .0618 .0606 .0594 .0582 .0571 .0559
For the lower tail, z = − 1.60 gives an area of 0.0548. Now to calculate our p
-value, we multiply this by 2.
p = 2(0.0548)
p = 0.1096
Solution:
The city wants to know if the amount of bromate in their drinking water is
higher than the allowable amount in a significant way.
21
H0 : μ ≤ 0.0100
Ha : μ > 0.0100
Upper-tail probability p
df 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005
28 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.408 3.674
29 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.396 3.659
30 0.683 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.385 3.646
50% 60% 70% 80% 90% 95% 98% 99% 99.8% 99.9%
Confidence level C
Because the test statistic and degrees of freedom gives a p-value just
under p = 0.025, we’ll round the p-value to p = 0.025.
22
Solution:
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
-2.4 .0082 .0080 .0078 .0075 .0073 .0071 .0069 .0068 .0066 .0064
-2.3 .0107 .0104 .0102 .0099 .0096 .0094 .0091 .0089 .0087 .0084
-2.2 .0139 .0136 .0132 .0129 .0125 .0122 .0119 .0116 .0113 .0110
For a lower-tail test, the p-value is given by this value we found in the z
-table, so p = 0.0107.
We know that
Therefore,
• For p = 0.0107 and α = 0.01, p > α, so she’d fail to reject the null
• For p = 0.0107 and α = 0.001, p > α, so she’d fail to reject the null
23
significance level of α = 0.05. She assumes the population is normally
distributed.
2. Check that the conditions for performing the statistical test are
met.
Solution:
24
The conditions for performing a t-test with a population mean are an
approximately normal distribution and a simple random sample, and we’ve
been told in the problem that both of those conditions are met.
H0 : μ ≥ 125
Ha : μ < 125
x̄ − μ0 118 − 125
t= s = 28.7
≈ − 0.9756
n 16
The next step is to find the p-value by looking up the test statistic in the t
-table. To look up a t-value, we’ll also need to know the degrees of
freedom from the problem. We know the study included 16 samples, so
the degrees of freedom are 16 − 1 = 15.
We calculated the test statistic as t ≈ − 0.9756. We’re looking for the area in
the lower tail, but the table will give us the area in the upper tail when
t = 0.9756. Remember these values are equal because the t-curve is
symmetric. Now we look up where our test statistic and degrees of
freedom intersect. The value we read from the t-table is somewhere
between p = 0.20 and p = 0.15.
25
Upper-tail probability p
df 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005
14 0.692 0.868 1.076 1.345 1.761 2.145 2.624 2.977 3.787 4.140
15 0.691 0.866 1.074 1.341 1.753 2.131 2.602 2.947 3.733 4.073
16 0.690 0.865 1.071 1.337 1.746 2.120 2.583 2.921 3.686 4.015
50% 60% 70% 80% 90% 95% 98% 99% 99.8% 99.9%
Confidence level C
Solution:
H0 : μ ≥ 5
Ha : μ < 5
26
x̄ − μ0 4.9 − 5 −0.1
t= s = 0.5
= 0.5
≈ − 1.0954
n 30 30
The next step is to look up the test statistic in the t-table. To look up a t
-value, we’ll also need to know the degrees of freedom from the problem.
We know the study included 30 students, so the degrees of freedom are
30 − 1 = 29.
We calculated the test statistic as t ≈ − 1.0954. We’re looking for the area in
the lower tail, but the table will give us the area in the upper tail when
t = 1.0954. Remember these values are equal because the t-curve is
symmetric. Now we look up where our test statistic and degrees of
freedom intersect. The value we read from the t-table is somewhere
between p = 0.15 and p = 0.10.
Upper-tail probability p
df 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005
28 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.408 3.674
29 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.396 3.659
30 0.683 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.385 3.646
50% 60% 70% 80% 90% 95% 98% 99% 99.8% 99.9%
Confidence level C
27
HYPOTHESIS TESTING FOR THE POPULATION PROPORTION
Solution:
The first step is to state the null and alternative hypotheses for the survey.
H0 : p ≥ 0.80
Ha : p < 0.80
28
p̂ − p 0.73 − 0.80
z= = = − 1.75
σp ̂ 0.04
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
-1.8 .0359 .0351 .0344 .0336 .0329 .0322 .0314 .0307 .0301 .0294
-1.7 .0446 .0436 .0427 .0418 .0409 .0401 .0392 .0384 .0375 .0367
-1.6 .0548 .0537 .0526 .0516 .0505 .0495 .0485 .0475 .0465 .0455
Since we have a one-tailed test, the p-value is p = 0.0401, and we were told
in the problem that α = 0.05. Because the p-value is less than the α-level,
p < α, we’ll reject the null hypothesis.
Solution:
H0 : p ≤ 64 %
Ha : p > 64 %
29
Because the alternative hypothesis uses a > sign, this is a one-tail, upper-
tailed test. We were told that the test statistic is z = 1.40, so we’ll look that
up in the z-table.
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
1.3 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177
1.4 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319
1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441
This is an upper-tail test, which means the p-value is the area outside of
0.9192, or
p = 1 − 0.9192
p = 0.0808
Solution:
The first step is to state the null and alternative hypotheses for the survey.
H0 : p = 0.60
30
Ha : p ≠ 0.60
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
p = 1 − 0.7190
p = 0.2810
p = 2(0.2810)
p = 0.5620
31
4. A gambler wins 48 % of the hands he plays, but he feels like he’s on a
losing streak recently, winning fewer hands than normal. He takes a
random sample of 40 of his recent hands, and finds the proportion of
winning hands in the sample to be p̂ = 0.45 with σp̂ = 0.00624. What can he
conclude with 90 % confidence?
Solution:
The first step is to state the null and alternative hypotheses for the survey.
H0 : p ≥ 0.48
Ha : p < 0.48
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
-0.4 .3446 .3409 .3372 .3336 .3300 .3264 .3228 .3192 .3156 .3121
-0.3 .3821 .3783 .3745 .3707 .3669 .3632 .3594 .3557 .3520 .3483
32
This is a lower-tail test, which means this is also the p-value, p = 0.3520.
We were told in the problem that α = 0.10. Because the p-value is greater
than the α-level, p < α, the gambler will fall to reject the null hypothesis,
and conclude that he hasn’t been on a losing streak at a statistically
significant level.
Solution:
The first step is to state the null and alternative hypotheses for the survey.
H0 : p = 0.92
Ha : p ≠ 0.92
33
The z-table gives 0.1922 for a z-score of z ≈ − 0.87.
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
-0.9 .1841 .1814 .1788 .1762 .1736 .1711 .1685 .1660 .1635 .1611
-0.8 .2119 .2090 .2061 .2033 .2005 .1977 .1949 .1922 .1894 .1867
-0.7 .2420 .2389 .2358 .2327 .2296 .2266 .2236 .2206 .2177 .2148
Since this is a two-tailed test, we double this to find the p-value, and we
get
p = 2(0.1922)
p = 0.3844
We were told in the problem that α = 0.05. Because the p-value is greater
than the α-level, p > α, we’ll fail to reject the null hypothesis, which means
that we can’t conclude that the number of homeowners who purchase an
internet subscription plan is different than 92 % .
6. A recent study reported that the 15.3 % of patients who are admitted
to the hospital with a heart attack die within 30 days of admission. The
same study reported that 16.7 % of the 3,153 patients who went to the
hospital with a heart attack died within 30 days of admission when the lead
cardiologist was away.
34
1. State the population parameter and whether a t-test or z-test
should be used.
2. Check that the conditions for performing the statistical test are
met.
Solution:
The sample size is large at 3,153 with a population proportion of 16.7 % , but
to continue with the test we need to assume that the sample was a simple
random sample (since it’s not stated in the problem).
np = (3,153)(0.167) ≈ 527 ≥ 10
35
n(1 − p) = (3,153)(1 − 0.167) ≈ 2,626 ≥ 10
When these two conditions are met, then the distribution is approximately
normal. Then we can continue with the hypothesis test.
H0 : p = 0.153
Ha : p ≠ 0.153
Since we’re dealing with a population proportion, the z-test statistic will be
p̂ − p0 0.167 − 0.153
z= = ≈ 2.1837
p0(1 − p0) 0.153(1 − 0.153)
n 3,153
The next step is to find the p-value by looking up the test statistic in the z
-table. Since this is a two-tailed test, we’ll need to double the area we find
in either the upper or lower tail. From the z-table, we find a value of 0.9854.
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
2.0 .9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817
2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857
2.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890
36
But this is the area below the upper tail. Before we can do anything else
we need to find the area in the upper tail. The total area under the curve is
1, so we’ll subtract this value from 1.
1 − 0.9854 = 0.0146
p = 2(0.0146)
p = 0.0292
We know that
Therefore,
• For p = 0.0292 and α = 0.01, p > α, so we’d fail to reject the null
37
CONFIDENCE INTERVAL FOR THE DIFFERENCE OF MEANS
Solution:
The sample variances are s12 = 0.352 = 0.1225 and s22 = 0.222 = 0.0484. The
variance s12 is more than twice the other variance, so the researcher will
assume unequal population variances.
Solution:
38
The sample variances are s12 = 222 = 484 and s22 = 262 = 676, and neither
sample variance is more than twice the other, so we can assume equal
population variances.
df = n1 + n2 − 2
df = 10 + 10 − 2
df = 18
Solution:
39
9(222) + 9(262)
sp =
18
222 + 262
sp =
2
sp ≈ 24.083
Upper-tail probability p
df 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005
17 0.689 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.646 3.965
18 0.688 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.610 3.922
19 0.688 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.579 3.883
50% 60% 70% 80% 90% 95% 98% 99% 99.8% 99.9%
Confidence level C
1 1
(a, b) = (x̄1 − x̄2) ± t α2 × sp +
n1 n2
1 1
(a, b) ≈ (258 − 252) ± 2.101 × 24.083 +
10 10
(a, b) ≈ 6 ± 22.63
40
(a, b) ≈ (−16.63,28.63)
Solution:
The sample variances are s12 = 42 = 16 and s22 = 5.52 = 30.25, and neither
sample variance is more than twice the other, so we’ll assume equal
population variances, which means the degrees of freedom will be
df = n1 + n2 − 2
df = 25 + 25 − 2
df = 48
41
The pooled standard deviation will be
384 + 726
sp =
48
sp = 23.125
1 1
ME = tα/2 × sp +
n1 n2
1 1
ME = 2.682 × 23.125 +
25 25
ME ≈ 2.682 1.85
ME ≈ 3.6479
42
Solution:
σ12 σ22
(a, b) = (x̄1 − x̄2) ± zα/2 +
n1 n2
2.252 2.022
(a, b) = (14.5 − 13.6) ± 1.65 +
250 250
(a, b) ≈ (0.584,1.216)
43
Solution:
σ12 σ22
(a, b) = (x̄1 − x̄2) ± zα/2 +
n1 n2
0.42 0.22
(a, b) = (1.6 − 2.5) ± 2.33 +
500 500
44
HYPOTHESIS TESTING FOR THE DIFFERENCE OF MEANS
1. An ice cream shop owner believes his average daily revenue is higher
in August than it is in September. He calculated average daily revenue of
$496 in August and $456 in September, with standard deviations of $14 and
$21.5, respectively. What can he conclude at a 0.05 significance level using
a p-value approach.
Solution:
The sample variances are s12 = 142 = 196 and s22 = 21.52 = 462.25, so the
second sample variance is more than twice the first. Therefore, because
the sample variances are unequal, we can assume unequal population
variances. Additionally, we have large samples nA = 31 and nS = 30, since
there are 31 days in August and 30 days in September, so we’ll use a z-test.
We’ll run an upper-tailed test because the shop owner believes the
average daily revenue in August was higher than in September.
45
x̄A − x̄S
z=
sA2 sS2
nA
+ nS
496 − 456
z=
142 21.52
31
+ 30
40
z=
196 462.25
31
+ 30
z ≈ 8.581
The test statistic is much larger than the largest z-value in the z-table, so
we could say that the probability of finding z ≈ 8.581 is almost 0. Therefore,
p ≤ α and we can reject the null hypothesis, concluding that the average
daily revenue in August was higher than in September.
46
Solution:
The sample variances are s12 = 1.052 = 1.1025 and s22 = 0.952 = 0.9025. Neither
sample variance is more than twice the other, which means we can assume
equal sample variances, and therefore equal population variances.
The null and alternative hypotheses for the upper-tailed test are
H0 : μ1 − μ2 ≤ 0
Ha : μ1 − μ2 > 0
Since both samples are larger than 30 and the population variances are
equal, we can calculate pooled standard deviation.
49(1.052) + 49(0.952)
sp =
98
1.052 + 0.952
sp =
2
sp ≈ 1.00125
47
x̄1 − x̄2
z=
1 1
sp n1
+ n2
6.12 − 5.5
z=
1 1
1.00125 50
+ 50
0.62
z=
1
1.00125 25
z ≈ 3.10
For an upper-tailed test and α = 0.01, the critical z-value is z = 2.33. Since
3.10 > 2.33, the coach can reject the null hypothesis and conclude that the
new weight loss program is more effective than the old program.
3. Test the claim that, in 2006, the mean weight of men in the US was not
significantly different from the mean weight of women. Previous research
showed population standard deviations were 10.25 pounds for men and
8.58 pounds for women. A random sample of 1,500 men has a mean weight
of 193.5 pounds and a random sample of 1,500 women has a mean weight
of 185.3 pounds. Assuming the population variances are unequal, use a p
-value approach to formulate a decision at the 0.05 significance level.
Solution:
48
Given the sample mean μm = 193.5 and population standard deviation
σm = 10.25 for men, and the sample mean μw = 185.3 and population
standard deviation σw = 8.58 for women, the null and alternative
hypotheses for the two-tailed test will be
H0 : μm − μw = 0
Ha : μm − μw ≠ 0
With unequal population variances and large samples, the test statistic will
be
x̄1 − x̄2
z=
σ12 σ22
n1
+ n2
193.5 − 185.3
z=
10.252 8.582
1,500
+ 1,500
8.2
z=
105.0625 + 73.6164
1,500
1,500
z = 8.2
105.0625 + 73.6164
z ≈ 23.76
The test statistic is much larger than the largest z-value in the z-table, so
we could say that the probability of finding z ≈ 23.76 is almost 0. Therefore,
49
p ≤ α and we can reject the null hypothesis, concluding that the mean
weight of men and women was significantly different.
Solution:
The null and alternative hypotheses for the two-tailed test will be
H0 : μm − μw = 0
Ha : μm − μw ≠ 0
With small samples and unequal population variances (s22 = 0.22 = 0.04 is
more than twice s12 = 0.132 = 0.0169), the t-statistic is
x̄1 − x̄2
t=
s12 s22
n1
+ n2
50
1.48 − 1.62
t=
0.0169 0.04
25
+ 25
−0.14
t=
0.0569
25
t ≈ − 2.9346
( n1 n2 )
2
s12 s22
+
df =
( n1 ) ( n2 )
2 2
1 s12 1 s22
n1 − 1
+ n2 − 1
( 25 25 )
2
0.0169 0.04
+
df =
( 25 ) ( 25 )
2 2
1 0.0169 1 0.04
25 − 1
+ 25 − 1
df ≈ 27.9966
51
5. Given x̄1 = 23.55 and x̄2 = 20.12 with s1 = 2.3, s2 = 2.9, n1 = 10, and n2 = 15,
determine whether the two population means differ significantly. Using a
critical value approach, and assuming population standard deviations are
unequal, what can we conclude at a 0.01 level of significance?
Solution:
H0 : μ1 − μ2 = 0
Ha : μ1 − μ2 ≠ 0
x̄1 − x̄2
t=
s12 s22
n1
+ n2
23.55 − 20.12
t=
2.32 2.92
10
+ 15
3.43
t=
5.29 8.41
10
+ 15
t ≈ 3.286
52
Calculate the number of degrees of freedom.
( n1 n2 )
2
s12 s22
+
df =
( n1 ) ( n2 )
2 2
1 s12 1 s22
n1 − 1
+ n2 − 1
( 10 15 )
2
2.32 2.92
+
df =
2.34 2.94
+
102(10 − 1) 152(15 − 1)
df ≈ 22.17
Rounding down to the nearest whole number gives df = 22. From the t
-table, we find that the critical t-value for the two-tailed test with α = 0.01
and df = 22 is t = 2.819.
Because 3.286 > 2.819, we can reject the null hypothesis and conclude that
there’s a significant difference in population means.
53
Solution:
Using μ1, x̄1, and s1 for July and μ2, x̄2, and s2 for August, the hypothesis
statements for the John’s upper-tailed test will be
H0 : μ1 − μ2 ≤ 0
Ha : μ1 − μ2 > 0
With large samples and unequal population variances, John’s test statistic
will be
x̄1 − x̄2
z=
s12 s22
n1
+ n2
28.4 − 27.3
z=
2.12 1.72
31
+ 31
31
z = 1.1
4.41 + 2.89
z ≈ 2.27
The critical z-value for α = 0.05 with an upper-tailed test is z = 1.65. Since
2.27 > 1.65, John can reject the null hypothesis and conclude that the mean
temperature in July is higher than in August.
54
MATCHED-PAIR HYPOTHESIS TESTING
1. A golf club manufacturer claims that their new driver delivers 15 yards
of extra driving distance. They record the before and after driving
distances of 10 top professional players.
Player 1 2 3 4 5 6 7 8 9 10
Before x1 303 308 295 305 301 312 287 294 300 301
After x2 307 320 297 315 305 316 299 302 307 315
Difference,
4 12 2 10 4 4 12 8 7 14
d
Solution:
H0 : μ2 − μ1 ≤ 15
Ha : μ2 − μ1 > 15
55
where μ1 is the mean driving distance with the players’ current drivers, and
μ2 is the mean driving distance with the manufacturer’s new driver. And
because μ2 − μ1 is the difference in distance, the hypothesis statements
could also be written as
H0 : μd ≤ 15
Ha : μd > 15
To find the mean difference, we’ll sum the differences and divide by the
number of matched-pairs in our sample, n = 10.
n
∑i=1 di 4 + 12 + 2 + 10 + 4 + 4 + 12 + 8 + 7 + 14 77
d¯ = = = = 7.7
n 10 10
So the sample mean tells us that mean distance gained is 7.7 yards. Then
the sample standard deviation is
sd =
n−1
56
(−3.7)2 + 4.32 + (−5.7)2 + 2.32 + (−3.7)2 + (−3.7)2 + 4.32 + 0.32 + (−0.7)2 + 6.32
13.69 + 18.49 + 32.49 + 5.29 + 13.69 + 13.69 + 18.49 + 0.09 + 0.49 + 39.69
156.1
156.1
sd =
9
sd ≈ 17.34
sd ≈ 4.165
d¯ − μd
t= sd
n
7.7 − 15
t≈ 4.165
10
10
t ≈ − 7.3 ⋅
4.165
t ≈ − 5.543
df = n − 1 = 10 − 1 = 9
57
At a significance level of 5 % (a confidence level of 95 % for an upper-tailed
test), and df = 9, the t-table gives 1.833.
Upper-tail probability p
df 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005
8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 4.501 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.297 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 4.144 4.587
50% 60% 70% 80% 90% 95% 98% 99% 99.8% 99.9%
Confidence level C
2. A car company believes that the changes they’ve made to their hybrid
engine will increase miles per gallon by 4. They send out one car with the
old engine and one car with the new engine to drive the same route, and
record the miles per gallon of each pair of cars.
Route 1 2 3 4 5 6 7 8 9 10
Old engine 39 39 38 42 44 43 42 47 47 47
New
50 49 45 46 46 41 42 43 43 49
engine
Difference,
11 10 7 4 2 -2 0 -4 -4 2
d
d2 121 100 49 16 4 4 0 16 16 4
58
Can the car company conclude at a 1 % significance level that the changes
they’ve made to the hybrid engine deliver 4 extra miles per gallon?
Solution:
The car company will define the values for the old engine as Population 1,
and the values for the new engine as Population 2, and their null and
alternative hypotheses will be
H0 : μ2 − μ1 ≤ 4
Ha : μ2 − μ1 > 4
where μ1 is the miles per gallon obtained by the old engine, and μ2 is the
miles per gallon obtained by the old engine. And because μ2 − μ1 is the
difference in miles per gallon, the hypothesis statements could also be
written as
H0 : μd ≤ 4
Ha : μd > 4
To find the mean difference, we’ll sum the differences and divide by the
number of matched-pairs in our sample, n = 10.
n
∑i=1 di 11 + 10 + 7 + 4 + 2 + (−2) + 0 + (−4) + (−4) + 2 26
d¯ = = = = 2.6
n 10 10
59
So the sample mean tells us that mean difference is 2.6 miles per gallon.
Then the sample standard deviation is
sd =
n−1
70.56 + 54.76 + 19.36 + 1.96 + 0.36 + 21.16 + 6.67 + 43.56 + 43.56 + 0.36
262.31
262.31
sd =
9
sd ≈ 29.15
sd ≈ 5.399
60
Because the population standard deviations are unknown, and/or because
both sample sizes are small, n1, n2 < 30, the test statistic will be
d¯ − μd
t= sd
n
2.6 − 4
t≈ 5.399
10
10
t ≈ − 1.4 ⋅
5.399
t ≈ − 0.82
df = n − 1 = 10 − 1 = 9
Upper-tail probability p
df 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005
8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 4.501 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.297 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 4.144 4.587
50% 60% 70% 80% 90% 95% 98% 99% 99.8% 99.9%
Confidence level C
The car company’s t-test statistic t ≈ − 0.82 doesn’t meet the threshold
t = 2.821, so the critical value approach tells them that they can’t reject the
61
null hypothesis, and therefore can’t conclude that their new engine adds 4
miles per gallon.
Student 1 2 3 4 5 6 7 8 9 10
In silence 14 13 16 21 15 19 11 20 19 16
With
12 13 15 22 16 19 8 17 18 17
music
Difference,
2 0 1 -1 -1 0 3 3 1 -1
d
d2 4 0 1 1 1 0 9 9 1 1
Solution:
We’ll define the hours spent studying in silence as Population 1, and the
hours spent studying with classical music as Population 2, and our null and
alternative hypotheses will be
H0 : μ1 − μ2 ≤ 0
62
Ha : μ1 − μ2 > 0
where μ1 is the mean number hours spent studying in silence, and μ2 is the
mean number of hours spent studying with classical music. And because
μ1 − μ2 is the difference in study time, the hypothesis statements could also
be written as
H0 : μd ≤ 0
Ha : μd > 0
To find the mean difference, we’ll sum the differences and divide by the
number of matched-pairs in our sample, n = 10.
n
∑i=1 di 2 + 0 + 1 + (−1) + (−1) + 0 + 3 + 3 + 1 + (−1) 7
d¯ = = = = 0.7
n 10 10
So the sample mean tells us that mean difference is 0.7 studying hours.
Then the sample standard deviation is
sd =
n−1
63
+(−1 − 0.7)2 + (0 − 0.7)2 + (3 − 0.7)2 + (3 − 0.7)2
1.32 + (−0.7)2 + 0.32 + (−1.7)2 + (−1.7)2 + (−0.7)2 + 2.32 + 2.32 + 0.32 + (−1.7)2
1.69 + 0.49 + 0.09 + 2.89 + 2.89 + 0.49 + 5.29 + 5.29 + 0.09 + 2.89
22.1
22.1
sd =
9
sd ≈ 2.46
sd ≈ 1.567
d¯ − μd
t= sd
n
0.7 − 0
t= 1.567
10
10
t = 0.7 ⋅
1.567
t ≈ 1.413
64
and the degrees of freedom are
df = n − 1 = 10 − 1 = 9
Upper-tail probability p
df 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005
8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 4.501 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.297 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 4.144 4.587
50% 60% 70% 80% 90% 95% 98% 99% 99.8% 99.9%
Confidence level C
Our t-test statistic t ≈ 1.413 meets the threshold t = 1.383, so the critical
value approach tells us that we can reject the null hypothesis, and
therefore conclude that studying with classical music reduces time spent
on homework.
4. A clothing store wants to test the claim that customers who join their
VIP program return less merchandise. They track the mean monthly
merchandise returns of 10 customers for one year before and after joining
the VIP program, then record the mean returns per month.
Customer 1 2 3 4 5 6 7 8 9 10
65
Can they conclude at a 5 % significance level that joining the VIP program
reduces the amount of merchandise returns?
Solution:
The clothing store will define the values for returns before enrolling in the
VIP program as Population 1, and the values for returns after enrolling in
the VIP program as Population 2, and their null and alternative hypotheses
will be
H0 : μ1 − μ2 ≤ 0
Ha : μ1 − μ2 > 0
H0 : μd ≤ 0
Ha : μd > 0
To find the mean difference, we’ll sum the differences and divide by the
number of matched-pairs in our sample, n = 10.
n
∑i=1 di (−3) + 11 + 13 + 3 + (−3) + 6 + 3 + 3 + (−7) + (−11) 15
d¯ = = = = 1.5
n 10 10
66
So the sample mean tells us that mean difference is 1.5 in merchandise
returns. Then the sample standard deviation is
sd =
n−1
20.25 + 90.25 + 132.25 + 2.25 + 20.25 + 20.25 + 2.25 + 2.25 + 72.25 + 156.25
518.5
518.5
sd =
9
sd ≈ 57.61
sd ≈ 7.59
67
Because the population standard deviations are unknown, and/or because
both sample sizes are small, n1, n2 < 30, the test statistic will be
d¯ − μd
t= sd
n
1.5 − 0
t≈ 7.59
10
10
t ≈ 1.5 ⋅
7.59
t ≈ 0.625
df = n − 1 = 10 − 1 = 9
Upper-tail probability p
df 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005
8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 4.501 5.041
9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.297 4.781
10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 4.144 4.587
50% 60% 70% 80% 90% 95% 98% 99% 99.8% 99.9%
Confidence level C
The clothing store’s t-test statistic t ≈ 0.625 doesn’t meet the threshold
t = 1.833, so the critical value approach tells them that they can’t reject the
68
null hypothesis, and therefore can’t conclude that their VIP program
causes customers to return less merchandise.
Solution:
( 25 )
2.5
(a, b) = 10 ± 2.064
(a, b) = 10 ± 2.064(0.5)
(a, b) = 10 ± 1.032
(a, b) = (8.968,11.032)
So we’re 95 % confident that the mean difference falls between 8.968 and
11.032.
69
6. If the mean difference is d¯ = 24 on a sample of n = 49 with population
¯
standard deviation σd = 3.2, calculate the 99 % confidence interval around d.
Solution:
For 99 % confidence with critical values z = ± 2.58 and a large sample n ≥ 30,
the confidence interval will be
σd
(a, b) = d¯ ± zα/2
n
( 49 )
3.2
(a, b) = 24 ± 2.58
(a, b) ≈ 24 ± 2.58(0.457)
(a, b) ≈ 24 ± 1.179
(a, b) ≈ (22.821,25.179)
So we’re 99 % confident that the mean difference falls between 22.821 and
25.179.
70
CONFIDENCE INTERVAL FOR THE DIFFERENCE OF PROPORTIONS
Solution:
x1 54
p1̂ = = = 0.36
n1 150
x2 47
p2̂ = = ≈ 0.294
n2 160
0.36(0.64) 0.294(0.706)
(a, b) ≈ 0.066 ± 1.96 +
150 160
71
Then the 95 % confidence interval is
(a, b) ≈ (−0.038,0.170)
2. A light bulb manufacturer wants to know whether their own bulbs last
longer than a competitor’s bulb. They randomly sampled 150 people who
bought their bulb, and 72 of them reported that it lasted longer than 250
days. They randomly sampled 150 people who bought the competitor’s
bulb, and 69 of them reported that it lasted for more than 250 days. Find a
90 % confidence interval around the difference of proportions.
Solution:
72
p1̂ = = 0.48
150
69
p2̂ = = 0.46
150
72
p1̂ (1 − p1̂ ) p2̂ (1 − p2̂ )
(a, b) = ( p1̂ − p2̂ ) ± zα/2 +
n1 n2
0.48(0.52) 0.46(0.54)
(a, b) = 0.02 ± 1.65 +
150 150
(a, b) ≈ (−0.075,0.115)
73
Solution:
38
p1̂ = = 0.76
50
24
p2̂ = = 0.48
50
0.76(0.24) 0.48(0.52)
(a, b) = 0.28 ± 2.58 +
50 50
(a, b) ≈ (0.04,0.52)
74
confident that the Vitamin C treatment shortens recovery time from the
common cold.
4. A researcher randomly chose 900 smokers, 450 men and 450 women.
He found that 357 of the male smokers have been diagnosed with coronary
artery disease, while 295 of the female smokers have been diagnosed with
coronary artery disease. Construct a 95 % confidence interval to estimate
the difference between the proportions of male and female smokers who
have been diagnosed with coronary artery disease.
Solution:
357
p1̂ = ≈ 0.793
450
295
p2̂ = ≈ 0.656
450
75
0.793(0.207) 0.656(0.344)
(a, b) = 0.137 ± 1.96 +
450 450
(a, b) ≈ (0.079,0.195)
Solution:
With p1̂ = 0.07 for n1 = 1,000 and p2̂ = 0.12 for n2 = 1,200, and critical values of
z = ± 2.58 for a 99 % confidence level, the confidence interval will be
76
0.07(1 − 0.07) 0.12(1 − 0.12)
(a, b) = (0.07 − 0.12) ± 2.58 +
1,000 1,200
0.07(0.93) 0.12(0.88)
(a, b) = − 0.05 ± 2.58 +
1,000 1,200
Solution:
77
24
p1̂ = ≈ 0.086
280
34
p2̂ = ≈ 0.097
350
0.086(0.914) 0.097(0.903)
(a, b) = − 0.011 ± 2.33 +
280 350
(a, b) ≈ (−0.065,0.043)
78
HYPOTHESIS TESTING FOR THE DIFFERENCE OF PROPORTIONS
H0 : p1 − p2 ≤ 0
Ha : p1 − p2 > 0
Solution:
p1̂ n1 + p2̂ n2
p̂ =
n1 + n2
0.456(278) + 0.384(310)
p̂ =
278 + 310
p̂ ≈ 0.418
p1̂ − p2̂
z=
̂ − p)̂ ( n1 +
p(1 n2 )
1
1
79
0.456 − 0.384
z≈
0.418(1 − 0.418)( 278 310 )
1 1
+
0.072
z≈
0.418(0.582)( 278 310 )
1 1
+
z ≈ 1.767
H0 : p1 − p2 = 0
Ha : p1 − p2 ≠ 0
Solution:
x1 234
p1̂ = = ≈ 0.203
n1 1,150
80
x2 327
p2̂ = = ≈ 0.248
n2 1,320
p1̂ n1 + p2̂ n2
p̂ =
n1 + n2
0.203(1,150) + 0.248(1,320)
p̂ =
1,150 + 1,320
p̂ ≈ 0.227
p1̂ − p2̂
z=
̂ − p)̂ ( n1 +
p(1 n2 )
1
1
0.203 − 0.248
z≈
0.227(1 − 0.227)( 1,150 1,320 )
1 1
+
−0.045
z≈
0.227(0.773)( 1,150 1,320 )
1 1
+
z ≈ − 2.663
81
3. A cinema owner wants to know whether there’s a difference in the
number of boys and girls who watched a new movie last week. She
randomly sampled 76 boys and 75 girls and found that 45 boys and 58 girls
watched the movie. What can she conclude about the difference of
proportions at a 99 % confidence level?
Solution:
H0 : p1 − p2 = 0
Ha : p1 − p2 ≠ 0
x1 45
p1̂ = = ≈ 0.592
n1 76
x2 58
p2̂ = = ≈ 0.773
n2 75
45 + 58
p̂ =
76 + 75
p̂ ≈ 0.682
82
p1̂ − p2̂
z=
̂ − p)̂ ( n1 +
p(1 n2 )
1
1
0.592 − 0.773
z=
0.682(1 − 0.682)( 76 75 )
1 1
+
−0.181
z=
0.682(0.318)( 76 75 )
1 1
+
z ≈ − 2.39
Solution:
83
Assuming p1 is the population proportion of women who spend more than
$100 and p2 is the population proportion of men who spend more than $100,
the store owner’s hypothesis statements for the upper-tailed test will be
H0 : p1 − p2 ≤ 0.22
Ha : p1 − p2 > 0.22
x1 23
p1̂ = = ≈ 0.7188
n1 32
x2 14
p2̂ = = = 0.4375
n2 32
x1 + x2
p̂ =
n1 + n2
23 + 14
p̂ =
32 + 32
p̂ ≈ 0.578
84
(0.7188 − 0.4375) − 0.22
z≈
0.578(1 − 0.578)( 32 32 )
1 1
+
0.0613
z≈
0.578(0.422)( 16 )
1
z ≈ 0.50
z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09
0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879
0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224
0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549
we can see that the area to the left of z = 0.50 is 0.6915. Because this is an
upper-tailed test, we’re interested in the area to the left of z = 0.50, so
1 − 0.6915 = 0.3085. Therefore, p = 0.3085. Since 0.3085 > 0.1, the store owner
fails to reject the null hypothesis. There’s not enough evidence to
conclude that women spend 22 % more than men.
85
Solution:
H0 : p1 − p2 ≥ 0
Ha : p1 − p2 < 0
p1̂ = 0.14
p2̂ = 0.23
0.14(60) + 0.23(75)
p̂ =
60 + 75
p̂ = 0.19
p1̂ − p2̂
z=
̂ − p)̂ ( n1 +
p(1 n2 )
1
1
86
0.14 − 0.23
z=
0.19(1 − 0.19)( 60 75 )
1 1
+
−0.09
z=
0.19(0.81)( 60 75 )
1 1
+
z = − 1.325
6. John and Steven are two fitness trainers who want to compare their
client satisfaction rate. John chose a random sample of 85 clients and
Steven chose a random sample of 72 clients. John found that 89 % of his
clients were satisfied and Steve found that 91 % of his clients were
satisfied. Using a critical value approach at a 95 % confidence level, is there
a significant difference between proportions?
Solution:
87
H0 : p1 − p2 = 0
Ha : p1 − p2 ≠ 0
p1̂ = 0.89
p2̂ = 0.91
p1̂ n1 + p2̂ n2
p̂ =
n1 + n2
0.89(85) + 0.91(72)
p̂ =
85 + 72
p̂ ≈ 0.899
0.89 − 0.91
z=
0.899(1 − 0.899)( 85 72 )
1 1
+
−0.02
z=
0.899(0.101)( 85 72 )
1 1
+
z ≈ − 0.414
88
For a 95 % confidence level and a two-tailed test, the critical z-values are
z = ± 1.96. Since −0.414 falls between −1.96 and 1.96, we fail to reject the null
hypothesis. There’s not enough evidence to conclude that there’s a
significant difference between John and Steven’s client satisfaction rate.
89
90