0% found this document useful (0 votes)
9 views

Workbook.hypothesis testing.solutions

The document discusses hypothesis testing in inferential statistics, providing examples of setting up null and alternative hypotheses for various scenarios involving population proportions and means. It also covers the significance level, Type I and Type II errors, and the importance of sample size in reducing error rates. Additionally, it includes calculations for test statistics for one- and two-tailed tests based on sample data.

Uploaded by

kart238
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
9 views

Workbook.hypothesis testing.solutions

The document discusses hypothesis testing in inferential statistics, providing examples of setting up null and alternative hypotheses for various scenarios involving population proportions and means. It also covers the significance level, Type I and Type II errors, and the importance of sample size in reducing error rates. Additionally, it includes calculations for test statistics for one- and two-tailed tests based on sample data.

Uploaded by

kart238
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 91

Hypothesis testing

INFERENTIAL STATISTICS AND HYPOTHESES

1. A current pain reliever has an 85 % success rate of treating pain. A


company develops a new pain reliever and wants to show that its success
rate of treating pain is better than the current option. Decide if the
hypothesis statement would require a population proportion or a
population mean, then set up the statistical hypothesis statements for the
situation.

Solution:

We’re interested in finding out if the new pain reliever has a better success
rate than the current one. Since we’re given a percentage of success, we’ll
be using a population proportion p, instead of a population mean μ. And
since we’re looking at how much better the pain reliever will perform, we
use the > symbol in our alternative hypothesis, which means the null
hypothesis has to have the ≤ symbol.

H0 : p ≤ 0.85

Ha : p > 0.85

2. A research study on people who quit smoking wants to show that the
average number of attempts to quit before a smoker is successful is less
than 3.5 attempts. How should they set up their hypothesis statements?

1
Solution:

We’re interested in finding out if the mean number of attempts is less than
3.5, so we’ll be using a population mean μ. And since we’re looking at
whether the mean is less than 3.5, we use the < symbol in our alternative
hypothesis, which means the null hypothesis has to have the ≥ symbol.

H0 : μ ≥ 3.5

Ha : μ < 3.5

3. A factory creates a small metal cylindrical part that later becomes part
of a car engine. Because of variations in the process of manufacturing, the
diameters are not always identical. The machine was calibrated to create
cylinders with an average diameter of 1/16 of an inch. During a periodic
inspection, it became clear that further investigation was needed to
determine whether or not the machine responsible for making the part
needed recalibration. Write statistical hypothesis statements.

Solution:

The factory wants the mean diameter of the parts it produces to match
the diameter that they need, 1/16 of an inch. That means this is an example
of a statistical hypothesis statement that uses the population mean.

2
Both parts that are too small or too large could create problems, which
means the alternative hypothesis needs to have a ≠ sign. Which means the
null hypothesis will include an = sign.

1
H0 : μ =
16

1
Ha : μ ≠
16

4. A marketing study for a clothing company concluded that the mean


percentage increase in sales could potentially be over 17 % for creating a
clothing line that focused on lime green and polka dots. Which hypothesis
statements do they need to write in order to test their theory?

Solution:

The claim of the marketing study is that creating the clothing line that
focuses on lime green and polka dots will increase sales by over 17 % .
Which means the alternative hypothesis would need to include the > sign,
and therefore that the null hypothesis has to include a ≤ sign.

H0 : μ ≤ 0.17

Ha : μ > 0.17

3
5. A food company wants to ensure that less than 0.0001 % of its product
is contaminated. Which hypothesis statements will it write if it wants to
test for this?

Solution:

The food company wants the proportion of contaminated product to be


less than 0.0001 % , so they’ll be using a population proportion p. And since
they’re looking at whether the proportion is less than 0.0001 % , they’ll use
the < symbol in the alternative hypothesis, which means the null
hypothesis has to have the ≥ symbol.

H0 : μ ≥ 0.0001 %

Ha : μ < 0.0001 %

6. A new medication is being developed to prevent heart worms in dogs,


and the developer wants it to work better than the current medication.
The current medication prevents heart worms at a rate of 75 % . What
hypothesis statements should they write if they want to test whether or
not the new medication works better than the existing one?

Solution:

The developer wants the proportion of dogs in which heart worm is


prevented by their medication to be greater than 0.75, so they’ll be using a

4
population proportion p. And since we’re looking at whether the
proportion is greater than 0.75, we use the > symbol in our alternative
hypothesis, which means the null hypothesis has to have the ≤ symbol.

H0 : μ ≤ 0.75

Ha : μ > 0.75

5
SIGNIFICANCE LEVEL AND TYPE I AND II ERRORS

1. We’re running a statistical test on a new pharmaceutical drug. The


stakes are high, because the side effects of the drug could potentially be
serious, or even fatal. If we want to reduce the Type I and Type II error
rates as low as possible to avoid rejecting the null when it’s true or
accepting the null when it’s false, what should we do when we take the
sample?

Solution:

The only way to reduce both the Type I error rate and Type II error rate
simultaneously is to increase the sample size. Therefore, if it’s important
that we reduce error rate as low as possible, we should take the largest
possible sample.

2. If the probability of making a Type II error in a statistical test is 5 % ,


what is the power of the test?

Solution:

The power of a statistical test is the probability that we’ll reject the null
hypothesis when it’s false (make that particular correct choice).

6
H0 is true H0 is false

Type I error CORRECT


Reject H0
P(Type I error)=alpha Power
Type II error
Accept H0 CORRECT
P(Type II error)=beta

Power is always equivalent to 1 − β, and β is another name for Type II error


rate. So

Power = 1 − β

Power = 1 − Type II error rate

Power = 1 − 0.05

Power = 0.95

The power of the statistical test, given that the probability of making a
Type II error is 5 % , is Power = 95 % .

3. On average, professional golfers make 75 % of putts within 5 feet. One


golfer believes he does better than this, and wants to use a statistical test
to see whether or not he’s correct. Unbeknownst to him, in actuality this
golfer makes 7 out of 10 of these kinds of putts. When he takes a sample
of his putts, he finds p̂ = 0.92. What kind of error might he be in danger of
making?

Solution:

7
The golfer’s null and alternative hypotheses are

H0 : p ≤ 0.75

Ha : p > 0.75

In reality, his null hypothesis is true, but based on the sample proportion
p̂ = 0.92, he may be in danger of rejecting the null when he shouldn’t.

H0 is true H0 is false

Type I error
Reject H0 CORRECT
P(Type I error)=alpha
Type II error
Accept H0 CORRECT
P(Type II error)=beta

Which means the golfer is in danger of making a Type I error.

4. The average age of a guest at an amusement park is 15 years old. One


amusement park believes the average age of their guests is younger than
this, and wants to use a statistical test to see whether or not they’re
correct. Unbeknownst to them, in actuality the average guest age at this
particular amusement park is 12 years old. When they take a sample of his
guests, they find x̄ = 16 years. What kind of error might they be in danger
of making?

Solution:

The park’s null and alternative hypotheses are

8
H0 : μ ≥ 15

Ha : μ < 15

In reality, they’re null hypothesis is false, but based on the sample mean
x̄ = 16, they may be in danger of accepting the null when they shouldn’t.

H0 is true H0 is false

Type I error
Reject H0 CORRECT
P(Type I error)=alpha
Type II error
Accept H0 CORRECT
P(Type II error)=beta

Which means the amusement park is in danger of making a Type II error.

5. Of all political donations, 70 % come from corporations and lobbies,


not from individual citizens. One politician believes he receives less than
70 % of his own donations from corporations and lobbies, and wants to use
a statistical test to see whether or not he’s correct. Unbeknownst to him,
in actuality the proportion of his donations that come from corporations
and lobbies is 65 % . When he takes a sample of his donations that come
from corporations and lobbies, he finds p̂ = 0.72. What kind of error might
he be in danger of making?

Solution:

The politician’s null and alternative hypotheses are

9
H0 : p ≥ 0.7

Ha : p < 0.7

In reality, his null hypothesis is false, but based on the sample proportion
p̂ = 0.72, he may be in danger of accepting the null when he shouldn’t.

H0 is true H0 is false

Type I error
Reject H0 CORRECT
P(Type I error)=alpha
Type II error
Accept H0 CORRECT
P(Type II error)=beta

Which means the politician is in danger of making a Type II error.

6. A coffee shop owner believes that he sells 500 cups of coffee each
day, on average, and he wants to test this assumption. The truth is, he
actually sells fewer than 500 cups each day. He takes a random sample of
10 days and records the number of cups he sells each of those days. What
kind of error is the coffee shop owner in danger of making?

Day 1 2 3 4 5 6 7 8 9 10
Cups
488 502 496 506 492 489 510 511 506 500
sold

Solution:

10
The owner’s null and alternative hypotheses are

H0 : μ = 500

Ha : μ ≠ 500

If we look at the data, we can see that the sample mean is x̄ = 500. In
reality, his null hypothesis is false, but based on the sample mean x̄ = 500,
he may be in danger of accepting the null when he shouldn’t.

H0 is true H0 is false

Type I error
Reject H0 CORRECT
P(Type I error)=alpha
Type II error
Accept H0 CORRECT
P(Type II error)=beta

Which means the coffee shop owner is in danger of making a Type II error.

11
TEST STATISTICS FOR ONE- AND TWO-TAILED TESTS

1. A local high school states that its students perform much better than
average on a state exam. The average score for all high school students in
the state is 106 points. A sample of 256 students at this particular school
had an average test score of 129 points with a sample standard deviation
of 26.8. Choose and calculate the appropriate test statistic.

Solution:

The sample is comparing average scores, which means the population


parameter is a population mean (not a proportion) with an unknown
standard deviation (since we have the sample standard deviation and not
the population standard deviation).

The sample size is large enough at 256 high schoolers that we can assume
the distribution is approximately normal. In this case, we use a t-test
statistic.

x̄ − μ0 129 − 106
t= s = 26.8
≈ 13.73
n 256

2. A dietician is looking into the claim at a local restaurant that the


number of calories in its portion sizes is lower than the national average.
The national average is 1,500 calories per meal. She samples 35 meals at

12
the restaurant and finds they contain an average of 1,250 calories per meal
with a sample standard deviation of 350.2. Choose and calculate the
appropriate test statistic.

Solution:

The sample is comparing average number of calories, which means the


population parameter is a population mean (not a proportion) with an
unknown standard deviation (since we have the sample standard deviation
and not the population standard deviation).

The sample size is large enough at 35 meals that we can assume the
distribution is approximately normal. In this case, we use a t-test statistic.

x̄ − μ0 1,250 − 1,500
t= s = 350.2
≈ − 4.22
n 35

3. In a recent survey, 567 out of a 768 randomly selected dog owners said
they used a kennel that was run by their veterinary office to board their
dogs while they were away on vacation. The study would like to make a
conclusion that the majority (more than 50 % ) of dog owners use a kennel
run by their veterinary office when the owners go on vacation. Choose and
calculate the appropriate test statistic.

Solution:

13
The sample size is large enough at 768 randomly selected individuals that
we can state the distribution is approximately normal. We can show this by
using the checks for the population proportion, np ≥ 10 and n(1 − p) ≥ 10.
The sample size is n = 768 and the sample proportion is

567
p̂ = ≈ 0.738
768

Therefore,

n p̂ = (768)(0.738) ≈ 567 ≥ 10

n(1 − p)̂ = (768)(1 − 0.738) ≈ 201 ≥ 10

So we can say that the test statistic will be the z-test statistic for a
population proportion.

p̂ − p0 0.738 − 0.5
z= = ≈ 13.21
p0(1 − p0) 0.5(1 − 0.5)
n 768

4. We want to open a day care center, so we take a random sample of


500 households in our town with children under preschool age, and find
that 243 of them were using a family member to care for those children.
We want to determine if, at a statistically significant level, fewer than half
of households in our town are using a family member to care for the kids.

1. Set up the hypothesis statements.

2. Check that the conditions for normality are met.

14
3. State the type of test: upper-tailed, lower-tailed, or two-tailed.

4. Calculate the test statistic using the appropriate formula.

Solution:

The hypothesis statements would be

H0 : p ≥ 0.5

Ha : p < 0.5

We need to see if we have an approximately normal distribution by using


the checks for a population proportion. The sample size is from a simple
random sample of n = 500 households. The proportion is the 243 out of the
500 households, so p̂ = 243/500 = 0.486.

n p̂ = (500)(0.486) = 243 ≥ 10

n(1 − p)̂ = (500)(1 − 0.486) = 257 ≥ 10

Because both values are greater than 10, the distribution is approximately
normal. This is a lower-tailed test because the alternative hypothesis uses
the < sign.

This is a population proportion, so we’ll calculate a z-test statistic for a


population proportion.

p̂ − p0 0.486 − 0.50
z= = ≈ − 0.6261
p0(1 − p0) 0.5(1 − 0.5)
n 500

15
5. The highest allowable amount of bromate in drinking water is
0.0100 mg/L2. A survey of a city’s water quality took 50 water samples in
random locations around the city and found an average of 0.0102 mg/L2 of
bromate with a sample standard deviation of 0.0025 mg/L. The survey
committee is interested in testing if the amount of bromate found in the
water samples is higher than the allowable amount at a statistically
significant level.

1. Set up the hypothesis statements.

2. Check that the conditions for normality are met.

3. State the type of test: upper-tailed, lower-tailed, or two-tailed.

4. Calculate the test statistic using the appropriate formula.

Solution:

The hypothesis statements would be

H0 : μ ≤ 0.0100

Ha : μ > 0.0100

The sample size is a simple random sample of 50 samples, so the


distribution is approximately normal. This is an upper-tailed test because
the alternative hypothesis uses the greater than sign.

16
This is a population mean with an unknown population standard deviation,
so we’ll calculate a t-test statistic with the population mean formula.

x̄ − μ0 0.0102 − 0.0100
t= s = 0.0025
≈ 0.5657
n 50

6. A farmer reads a study that states: The average weight of a day-old


chick upon hatching is μ0 = 38.60 grams with a population standard
deviation of σ = 5.7 grams. The farmer wants to see if her day-old chicks
have the same average. She takes a simple random sample of 60 of her
day-old chicks and finds their average weight is x̄ = 39.1 grams.

1. Set up the hypothesis statements.

2. Check that the conditions for normality are met.

3. State the type of test: upper-tailed, lower-tailed, or two-tailed.

4. Calculate the test statistic using the appropriate formula.

Solution:

The hypothesis statements would be

H0 : μ = 38.60

Ha : μ ≠ 38.60

17
The sample size is a simple random sample of 60 of her day-old chicks so
we can say the distribution is approximately normal. This is a two-tailed
test because the alternative hypothesis uses the ≠ sign.

This is a population mean with a known population standard deviation, so


we’ll calculate a z-test statistic with the population mean formula.

x̄ − μ0 39.10 − 38.60
z= σ = 5.7
≈ 0.6795
n 60

18
THE P-VALUE AND REJECTING THE NULL

1. A medical trial is conducted to test whether or not a new medicine


reduces total cholesterol, when the national average is 230 mg/dL with a
standard deviation of 16 mg/dL. The trial takes a simple random sample of
223 adults who take the new medicine, and finds x̄ = 227 mg/dL. What can
the trial conclude at a significance level of α = 0.01?

Solution:

The hypothesis statements will be

H0 : μ ≥ 230

Ha : μ < 230

The test statistic will be

x̄ − μ 227 − 230 3 223


z= σ = =− ≈ − 2.80
16 16
n 223

From the z-table we get 0.0026.

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

-2.9 .0019 .0018 .0018 .0017 .0016 .0016 .0015 .0015 .0014 .0014

-2.8 .0026 .0025 .0024 .0023 .0023 .0022 .0021 .0021 .0020 .0019

-2.7 .0035 .0034 .0033 .0032 .0031 .0030 .0029 .0028 .0027 .0026

19
Because this is a lower-tail test, the p-value is just this value we found,
p = 0.0026. Comparing this to α = 0.01, we can see that p ≤ α, which means
we’ll reject the null hypothesis.

Therefore, at a significance level of α = 0.01, the trial can conclude that the
new medicine reduces cholesterol. Because the p-value we found is even
more significant, the trial could go even further, stating that the new
medicine reduces cholesterol at a significance level of p = 0.0026.

2. The national average length of pregnancy is 283.6 days with a


population standard deviation of 10.5 days. A hospital wants to know if the
average length of a pregnancy at their hospital deviates from the national
average. They use a sample of 9,411 births at the hospital to calculate a
test statistic of z = − 1.60. Set up the hypothesis statements and find the p
-value.

Solution:

The hospital wants to know if mean length of pregnancy at their hospital is


different than the national average in a significant way.

H0 : μ = 283.6

Ha : μ ≠ 283.6

20
Because the alternative hypothesis uses a ≠ sign, this is a two-tailed test.
We were told in the problem that the test statistic is z = − 1.60, so we’ll look
that up in the z-table.

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

-1.7 .0446 .0436 .0427 .0418 .0409 .0401 .0392 .0384 .0375 .0367

-1.6 .0548 .0537 .0526 .0516 .0505 .0495 .0485 .0475 .0465 .0455

-1.5 .0668 .0655 .0643 .0630 .0618 .0606 .0594 .0582 .0571 .0559

For the lower tail, z = − 1.60 gives an area of 0.0548. Now to calculate our p
-value, we multiply this by 2.

p = 2(0.0548)

p = 0.1096

3. The highest allowable amount of bromate in drinking water is


0.0100 (mg/L)2. A survey of a city’s water quality took 31 water samples in
random locations around the city and used the data to calculate a test
statistic of t = 2.04. The city wants to know if the amount of bromate in
their drinking water is too high. Set up the hypothesis statements and
determine the type of test, then find the p-value.

Solution:

The city wants to know if the amount of bromate in their drinking water is
higher than the allowable amount in a significant way.

21
H0 : μ ≤ 0.0100

Ha : μ > 0.0100

Because the alternative hypothesis uses a > sign, this is an upper-tailed


test. We were told in the problem that the test statistic is t = 2.04, so we’ll
look that up in the t-table, but we’ll also need to know the degrees of
freedom. We know the study included 31 samples, so degrees of freedom
is n − 1 = 31 − 1 = 30.

Upper-tail probability p

df 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005

28 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.408 3.674

29 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.396 3.659

30 0.683 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.385 3.646

50% 60% 70% 80% 90% 95% 98% 99% 99.8% 99.9%

Confidence level C

Because the test statistic and degrees of freedom gives a p-value just
under p = 0.025, we’ll round the p-value to p = 0.025.

4. A paint company produces glow in the dark paint with an advertised


glow time of 15 min. A painter is interested in finding out if the product
behaves worse than advertised. She sets up her hypothesis statements as
H0 : μ ≥ 15 and Ha : μ < 15, then calculates a test statistic of z = − 2.30. What
would be the conclusions of her hypothesis test at significance levels of
α = 0.05, α = 0.01, and α = 0.001?

22
Solution:

We need to look up z = − 2.30 in the z-table.

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

-2.4 .0082 .0080 .0078 .0075 .0073 .0071 .0069 .0068 .0066 .0064

-2.3 .0107 .0104 .0102 .0099 .0096 .0094 .0091 .0089 .0087 .0084

-2.2 .0139 .0136 .0132 .0129 .0125 .0122 .0119 .0116 .0113 .0110

For a lower-tail test, the p-value is given by this value we found in the z
-table, so p = 0.0107.

We know that

If p ≤ α, reject the null hypothesis

If p > α, do not reject the null hypothesis

Therefore,

• For p = 0.0107 and α = 0.05, p ≤ α, so she’d reject the null

• For p = 0.0107 and α = 0.01, p > α, so she’d fail to reject the null

• For p = 0.0107 and α = 0.001, p > α, so she’d fail to reject the null

5. An article reports that the average wasted time by an employee is 125


minutes every day. A manager takes a small random sample of 16
employees and monitors their wasted time, calculating that average
wasted time for her employees is 118 minutes with a standard deviation of
28.7 minutes. She wants to know if 118 minutes is below average at a

23
significance level of α = 0.05. She assumes the population is normally
distributed.

1. State the population parameter and whether a t-test or z-test


should be used.

2. Check that the conditions for performing the statistical test are
met.

3. Set up the hypothesis statements.

4. State the type of test: upper-tailed, lower-tailed, or two-tailed.

5. Calculate the test statistic using the appropriate formula.

6. Calculate the p-value.

7. Compare the p-value to the significance level and draw a


conclusion.

Solution:

This is a population mean with an unknown population standard deviation


because the manager is going to do her analysis based on the sample
standard deviation. She also has a small sample size of 16 employees. This
means we should use the t-test statistic because we have a small sample
size and also an unknown population standard deviation.

24
The conditions for performing a t-test with a population mean are an
approximately normal distribution and a simple random sample, and we’ve
been told in the problem that both of those conditions are met.

The manager wants to know if 118 minutes is below average. We’re


comparing 118 minutes to the stated average of 125 minutes. Since she
wants to know if her measurement is below average, we should use the
less than symbol in our alternative hypothesis.

H0 : μ ≥ 125

Ha : μ < 125

The test statistic will be

x̄ − μ0 118 − 125
t= s = 28.7
≈ − 0.9756
n 16

The next step is to find the p-value by looking up the test statistic in the t
-table. To look up a t-value, we’ll also need to know the degrees of
freedom from the problem. We know the study included 16 samples, so
the degrees of freedom are 16 − 1 = 15.

We calculated the test statistic as t ≈ − 0.9756. We’re looking for the area in
the lower tail, but the table will give us the area in the upper tail when
t = 0.9756. Remember these values are equal because the t-curve is
symmetric. Now we look up where our test statistic and degrees of
freedom intersect. The value we read from the t-table is somewhere
between p = 0.20 and p = 0.15.

25
Upper-tail probability p

df 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005

14 0.692 0.868 1.076 1.345 1.761 2.145 2.624 2.977 3.787 4.140

15 0.691 0.866 1.074 1.341 1.753 2.131 2.602 2.947 3.733 4.073

16 0.690 0.865 1.071 1.337 1.746 2.120 2.583 2.921 3.686 4.015

50% 60% 70% 80% 90% 95% 98% 99% 99.8% 99.9%

Confidence level C

Regardless of the exact value of p between p = 0.20 and p = 0.15, at a


significance level of α = 0.05, we can say p > α, so the manager will fail to
reject the null hypothesis, and conclude that there’s not enough evidence
to conclude that her employees waste less time than the average rate of
125 minutes per day at the significance level of α = 0.05.

6. We want to test if college students take fewer than than 5 years to


graduate, on average, so we take a simple random sample of 30 students
and record their years to graduate. For the sample, x̄ = 4.9 and s = 0.5.
What can we conclude at 90 % confidence?

Solution:

The hypothesis statements will be

H0 : μ ≥ 5

Ha : μ < 5

Find the test statistic.

26
x̄ − μ0 4.9 − 5 −0.1
t= s = 0.5
= 0.5
≈ − 1.0954
n 30 30

The next step is to look up the test statistic in the t-table. To look up a t
-value, we’ll also need to know the degrees of freedom from the problem.
We know the study included 30 students, so the degrees of freedom are
30 − 1 = 29.

We calculated the test statistic as t ≈ − 1.0954. We’re looking for the area in
the lower tail, but the table will give us the area in the upper tail when
t = 1.0954. Remember these values are equal because the t-curve is
symmetric. Now we look up where our test statistic and degrees of
freedom intersect. The value we read from the t-table is somewhere
between p = 0.15 and p = 0.10.

Upper-tail probability p

df 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005

28 0.683 0.855 1.056 1.313 1.701 2.048 2.467 2.763 3.408 3.674

29 0.683 0.854 1.055 1.311 1.699 2.045 2.462 2.756 3.396 3.659

30 0.683 0.854 1.055 1.310 1.697 2.042 2.457 2.750 3.385 3.646

50% 60% 70% 80% 90% 95% 98% 99% 99.8% 99.9%

Confidence level C

Regardless of the exact value of p between p = 0.15 and p = 0.10, at a


significance level of α = 0.10, we can say p > α, so we’ll fail to reject the null
hypothesis, because there’s not enough evidence that college students
take less than 5 years to graduate at a significance level of α = 0.1.

27
HYPOTHESIS TESTING FOR THE POPULATION PROPORTION

1. A large electric company claims that at least 80 % of the company’s


1,000,000 customers are very satisfied. Using a simple random sample, 100
customers were surveyed and 73 % of the participants were very satisfied.
Based on these results, should we use a one- or two- tailed test, and
should we accept or reject the company’s hypothesis? Assume a
significance level of 0.05.

Solution:

The first step is to state the null and alternative hypotheses for the survey.

H0 : p ≥ 0.80

Ha : p < 0.80

These hypotheses require a one-tailed test, specifically a lower-tail test.


The null hypothesis will be rejected only if the sample proportion is
significantly less than 80 % .

We calculate standard error based on the sample,

p0(1 − p0) 0.8(1 − 0.8)


σp̂ = = = 0.04
n 100

and then compute the z-score test statistic.

28
p̂ − p 0.73 − 0.80
z= = = − 1.75
σp ̂ 0.04

The z-table gives 0.0401 for a z-score of z = − 1.75.

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

-1.8 .0359 .0351 .0344 .0336 .0329 .0322 .0314 .0307 .0301 .0294

-1.7 .0446 .0436 .0427 .0418 .0409 .0401 .0392 .0384 .0375 .0367

-1.6 .0548 .0537 .0526 .0516 .0505 .0495 .0485 .0475 .0465 .0455

Since we have a one-tailed test, the p-value is p = 0.0401, and we were told
in the problem that α = 0.05. Because the p-value is less than the α-level,
p < α, we’ll reject the null hypothesis.

2. A university is conducting a statistical test to determine whether the


percentage of its students who live on its campus is above the national
average of 64 % . They’ve calculated the test statistic to be z = 1.40. Set up
hypothesis statements and find the p-value.

Solution:

The university wants to know if the proportion of students who live on


campus is above the national average in a statistically significant way.

H0 : p ≤ 64 %

Ha : p > 64 %

29
Because the alternative hypothesis uses a > sign, this is a one-tail, upper-
tailed test. We were told that the test statistic is z = 1.40, so we’ll look that
up in the z-table.

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

1.3 .9032 .9049 .9066 .9082 .9099 .9115 .9131 .9147 .9162 .9177

1.4 .9192 .9207 .9222 .9236 .9251 .9265 .9279 .9292 .9306 .9319

1.5 .9332 .9345 .9357 .9370 .9382 .9394 .9406 .9418 .9429 .9441

This is an upper-tail test, which means the p-value is the area outside of
0.9192, or

p = 1 − 0.9192

p = 0.0808

3. A report claims that 60 % of American families take fewer than 6


months to purchase a home, from the time they start looking to the time
they make their first offer. A realtor wants to know if her clients purchase
at the same rate, so she takes a simple random sample of 50 of her clients
and finds p̂ = 0.64 and σp̂ = 0.0048 from the sample. What can she
conclude with 90 % confidence?

Solution:

The first step is to state the null and alternative hypotheses for the survey.

H0 : p = 0.60

30
Ha : p ≠ 0.60

These hypotheses require a two-tailed test. The null hypothesis will be


rejected only if the sample proportion is significantly different than 60 % .

The z-score will be

p̂ − p 0.64 − 0.60 0.04


z= = = ≈ 0.58
σp ̂ 0.0048 0.0048

The z-table gives 0.7190 for a z-score of z = 0.58.

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879

0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224

0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549

The p-value is the area outside of 0.7190, or

p = 1 − 0.7190

p = 0.2810

Since we have a two-tailed test, the p-value is double this, or

p = 2(0.2810)

p = 0.5620

We were told in the problem that α = 0.10, so p ≥ α, which means the


realtor will fail to reject the null hypothesis. So she can’t say that her
clients purchase at a different rate than the report claims.

31
4. A gambler wins 48 % of the hands he plays, but he feels like he’s on a
losing streak recently, winning fewer hands than normal. He takes a
random sample of 40 of his recent hands, and finds the proportion of
winning hands in the sample to be p̂ = 0.45 with σp̂ = 0.00624. What can he
conclude with 90 % confidence?

Solution:

The first step is to state the null and alternative hypotheses for the survey.

H0 : p ≥ 0.48

Ha : p < 0.48

These hypotheses require a one-tailed test, specifically a lower-tail test.


The null hypothesis will be rejected only if the sample proportion is
significantly lower than 48 % .

The z-score test statistic will be

p̂ − p 0.45 − 0.48 −0.03


z= = = ≈ − 0.38
σp ̂ 0.00624 0.00624

The z-table gives 0.3520 for a z-score of z = − 0.38.

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

-0.4 .3446 .3409 .3372 .3336 .3300 .3264 .3228 .3192 .3156 .3121

-0.3 .3821 .3783 .3745 .3707 .3669 .3632 .3594 .3557 .3520 .3483

32
This is a lower-tail test, which means this is also the p-value, p = 0.3520.

We were told in the problem that α = 0.10. Because the p-value is greater
than the α-level, p < α, the gambler will fall to reject the null hypothesis,
and conclude that he hasn’t been on a losing streak at a statistically
significant level.

5. A study claims that the proportion of new homeowners who purchase


an internet subscription plan is 0.92. We take a random sample of 140 new
homeowners to test this claim, and find p̂ = 0.9 with σp̂ ≈ 0.0229. What can
we conclude at a significance level of α = 0.05?

Solution:

The first step is to state the null and alternative hypotheses for the survey.

H0 : p = 0.92

Ha : p ≠ 0.92

These hypotheses require a two-tailed test. The null hypothesis will be


rejected only if the sample proportion is significantly different than 92 % .

The z-score test statistic will be

p̂ − p 0.90 − 0.92 0.02


z= = =− ≈ − 0.87
σp ̂ 0.0229 0.0229

33
The z-table gives 0.1922 for a z-score of z ≈ − 0.87.

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

-0.9 .1841 .1814 .1788 .1762 .1736 .1711 .1685 .1660 .1635 .1611

-0.8 .2119 .2090 .2061 .2033 .2005 .1977 .1949 .1922 .1894 .1867

-0.7 .2420 .2389 .2358 .2327 .2296 .2266 .2236 .2206 .2177 .2148

Since this is a two-tailed test, we double this to find the p-value, and we
get

p = 2(0.1922)

p = 0.3844

We were told in the problem that α = 0.05. Because the p-value is greater
than the α-level, p > α, we’ll fail to reject the null hypothesis, which means
that we can’t conclude that the number of homeowners who purchase an
internet subscription plan is different than 92 % .

6. A recent study reported that the 15.3 % of patients who are admitted
to the hospital with a heart attack die within 30 days of admission. The
same study reported that 16.7 % of the 3,153 patients who went to the
hospital with a heart attack died within 30 days of admission when the lead
cardiologist was away.

Is there enough evidence to conclude that the percentage of patients who


die when the lead cardiologist is away is any different than when they’re
present? Make conclusions at significance levels of α = 0.05 and α = 0.01.

34
1. State the population parameter and whether a t-test or z-test
should be used.

2. Check that the conditions for performing the statistical test are
met.

3. Set up the hypothesis statements.

4. State the type of test: upper-tailed, lower-tailed, or two-tailed.

5. Calculate the test statistic using the appropriate formula.

6. Calculate the p-value.

7. Compare the p-value to the significance level and draw a


conclusion.

Solution:

This is a population proportion because the data is looking at the


proportion of heart attack patients admitted to the hospital who die within
30 days of admittance.

The sample size is large at 3,153 with a population proportion of 16.7 % , but
to continue with the test we need to assume that the sample was a simple
random sample (since it’s not stated in the problem).

This sample size is large enough to meet the conditions:

np = (3,153)(0.167) ≈ 527 ≥ 10

35
n(1 − p) = (3,153)(1 − 0.167) ≈ 2,626 ≥ 10

When these two conditions are met, then the distribution is approximately
normal. Then we can continue with the hypothesis test.

According to the problem, we want to know if the percentage of patients


who went to the hospital with a heart attack and died within 30 days of
admission when the leading cardiologist was away differs from when they
were not away. This means we need to use the ≠ symbol in our hypothesis
statement.

H0 : p = 0.153

Ha : p ≠ 0.153

Since we’re dealing with a population proportion, the z-test statistic will be

p̂ − p0 0.167 − 0.153
z= = ≈ 2.1837
p0(1 − p0) 0.153(1 − 0.153)
n 3,153

The next step is to find the p-value by looking up the test statistic in the z
-table. Since this is a two-tailed test, we’ll need to double the area we find
in either the upper or lower tail. From the z-table, we find a value of 0.9854.

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

2.0 .9772 .9778 .9783 .9788 .9793 .9798 .9803 .9808 .9812 .9817

2.1 .9821 .9826 .9830 .9834 .9838 .9842 .9846 .9850 .9854 .9857

2.2 .9861 .9864 .9868 .9871 .9875 .9878 .9881 .9884 .9887 .9890

36
But this is the area below the upper tail. Before we can do anything else
we need to find the area in the upper tail. The total area under the curve is
1, so we’ll subtract this value from 1.

1 − 0.9854 = 0.0146

Now to calculate the p-value, we multiply the upper tail by 2.

p = 2(0.0146)

p = 0.0292

We know that

If p ≤ α, reject the null hypothesis

If p > α, do not reject the null hypothesis

Therefore,

• For p = 0.0292 and α = 0.05, p ≤ α, so we’d reject the null

• For p = 0.0292 and α = 0.01, p > α, so we’d fail to reject the null

Which means there’s enough evidence to conclude that the percentage of


patients who went to the hospital with a heart attack and died within 30
days of admission when the leading cardiologist was away is different
than when the leading cardiologist is present, at a statistically significant
level of α = 0.05, but not at α = 0.01.

37
CONFIDENCE INTERVAL FOR THE DIFFERENCE OF MEANS

1. A researcher wants to compare the effectiveness of new blood


pressure medication for males and females. He takes a simple random
sample of 25 males and 25 females and finds an average drop in blood
pressure of 4.5 with a standard deviation of 0.35 for males, and an average
drop in blood pressure of 4.85 with a standard deviation of 0.22 for females.
Can he use pooled standard deviation to find the confidence interval?

Solution:

The sample variances are s12 = 0.352 = 0.1225 and s22 = 0.222 = 0.0484. The
variance s12 is more than twice the other variance, so the researcher will
assume unequal population variances.

2. A grocery store wants to know whether families of 3 spend more on


groceries than families of 2. They randomly survey ten 3-person families
and find a mean weekly grocery spend of $258 with a standard deviation of
$22, then randomly survey ten 2-person families and find a mean weekly
grocery spend of $252 with a standard deviation of $26. Calculate the
number of degrees of freedom.

Solution:

38
The sample variances are s12 = 222 = 484 and s22 = 262 = 676, and neither
sample variance is more than twice the other, so we can assume equal
population variances.

Which means that the degree so freedom will be given by

df = n1 + n2 − 2

df = 10 + 10 − 2

df = 18

3. For the last question, calculate a 95 % confidence interval around the


difference in mean weekly grocery spending for 3-and 2-person families.

Solution:

Because we already determined in the previous solution that we’re


working with equal population variances, we’ll calculate pooled standard
deviation.

(n1 − 1)s12 + (n2 − 1)s22


sp =
n1 + n2 − 2

(10 − 1)222 + (10 − 1)262


sp =
10 + 10 − 2

39
9(222) + 9(262)
sp =
18

222 + 262
sp =
2

sp ≈ 24.083

At 95 % confidence and df = 18, the t-table gives 2.101.

Upper-tail probability p

df 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005

17 0.689 0.863 1.069 1.333 1.740 2.110 2.567 2.898 3.646 3.965

18 0.688 0.862 1.067 1.330 1.734 2.101 2.552 2.878 3.610 3.922

19 0.688 0.861 1.066 1.328 1.729 2.093 2.539 2.861 3.579 3.883

50% 60% 70% 80% 90% 95% 98% 99% 99.8% 99.9%

Confidence level C

Then the confidence interval is

1 1
(a, b) = (x̄1 − x̄2) ± t α2 × sp +
n1 n2

1 1
(a, b) ≈ (258 − 252) ± 2.101 × 24.083 +
10 10

(a, b) ≈ 6 ± 22.63

Therefore, we can say that the confidence interval is

(a, b) ≈ (6 − 22.63,6 + 22.63)

40
(a, b) ≈ (−16.63,28.63)

We can be 95 % confident that the true difference of mean weekly grocery


spending for 3- and 2-person families will fall between −$16.63 and $28.63.
But because the confidence interval contains 0, it means there’s likely no
difference between the population means.

4. A researcher is interested in whether a new fitness program lowers


systolic blood pressure. He enrolls 50 participants into the study and
randomly splits them into two groups of 25 each. The first group kept their
same physical activity habits, while the second group followed the new
fitness program. After a month, the mean systolic blood pressure in the
group of exercisers was 123 with standard deviation of 4, and the mean
systolic pressure in the group of non-exercisers was 131 with a standard
deviation of 5.5. Calculate the margin of error at 99 % confidence.

Solution:

The sample variances are s12 = 42 = 16 and s22 = 5.52 = 30.25, and neither
sample variance is more than twice the other, so we’ll assume equal
population variances, which means the degrees of freedom will be

df = n1 + n2 − 2

df = 25 + 25 − 2

df = 48

41
The pooled standard deviation will be

(n1 − 1)s12 + (n2 − 1)s22


sp =
n1 + n2 − 2

(25 − 1)(16) + (25 − 1)(30.25)


sp =
25 + 25 − 2

384 + 726
sp =
48

sp = 23.125

At df = 48 and 99 % confidence, the t-table gives t = 2.682. Now we can


calculate the margin of error as

1 1
ME = tα/2 × sp +
n1 n2

1 1
ME = 2.682 × 23.125 +
25 25

ME ≈ 2.682 1.85

ME ≈ 3.6479

5. Given population standard deviations σ1 = 2.25 and σ2 = 2.02, with


sample means x̄1 = 14.5 and x̄2 = 13.6 and sample sizes n1 = 250 and n2 = 250,
calculate a 90 % confidence interval around the difference of means.

42
Solution:

A 90 % confidence level is associated with z-scores of z = ± 1.65, so the


confidence interval will be

σ12 σ22
(a, b) = (x̄1 − x̄2) ± zα/2 +
n1 n2

2.252 2.022
(a, b) = (14.5 − 13.6) ± 1.65 +
250 250

(a, b) ≈ 0.9 ± 0.316

Therefore, we can say that the confidence interval is

(a, b) ≈ (0.9 − 0.316,0.9 + 0.316)

(a, b) ≈ (0.584,1.216)

6. Owners of a large shopping center want to determine whether or not


there’s a difference in the amount of time that men and women spend per
visit to the shopping center. Previous studies showed a standard deviation
of 0.4 hours for men and 0.2 hours for women. The owners sample 500 men
and 500 women and find that the mean time spent per visit was 1.6 hours
for men and 2.5 hours for women. Find a 98 % confidence interval around
the difference of means.

43
Solution:

The z-values associated with a 98 % confidence level are z ± 2.33, so


because the population standard deviations are known, the confidence
interval will be given by

σ12 σ22
(a, b) = (x̄1 − x̄2) ± zα/2 +
n1 n2

0.42 0.22
(a, b) = (1.6 − 2.5) ± 2.33 +
500 500

(a, b) = − 0.9 ± 0.0466

Therefore, we can say that the confidence interval is

(a, b) = (−0.9 − 0.0466, − 0.9 + 0.0466)

(a, b) = (−0.9466, − 0.8534)

We can be 98 % confidence that the true difference between mean time


spent in the shopping center per visit by men and women will fall between
−0.95 and −0.85 hours. Therefore, we’ve provided support for the
hypothesis that men spend less time in the shopping center per visit than
women.

44
HYPOTHESIS TESTING FOR THE DIFFERENCE OF MEANS

1. An ice cream shop owner believes his average daily revenue is higher
in August than it is in September. He calculated average daily revenue of
$496 in August and $456 in September, with standard deviations of $14 and
$21.5, respectively. What can he conclude at a 0.05 significance level using
a p-value approach.

Solution:

The sample variances are s12 = 142 = 196 and s22 = 21.52 = 462.25, so the
second sample variance is more than twice the first. Therefore, because
the sample variances are unequal, we can assume unequal population
variances. Additionally, we have large samples nA = 31 and nS = 30, since
there are 31 days in August and 30 days in September, so we’ll use a z-test.

We’ll run an upper-tailed test because the shop owner believes the
average daily revenue in August was higher than in September.

H0 : μA − μS ≤ 0; the average daily revenue in August is not higher


than in September.

Ha : μA − μS > 0; the average daily revenue in August is higher than in


September.

Then the test statistic is

45
x̄A − x̄S
z=
sA2 sS2
nA
+ nS

496 − 456
z=
142 21.52
31
+ 30

40
z=
196 462.25
31
+ 30

z ≈ 8.581

The test statistic is much larger than the largest z-value in the z-table, so
we could say that the probability of finding z ≈ 8.581 is almost 0. Therefore,
p ≤ α and we can reject the null hypothesis, concluding that the average
daily revenue in August was higher than in September.

2. A fitness coach wants to determine whether his new weight loss


program is more effective than his old program. He randomly samples 50
of his clients following each program, and finds a mean weight loss of 5.5
pounds with a standard deviation of 1.05 pounds for those following the
old program, and a mean weight loss of 6.12 pounds with a standard
deviation of 0.95 pounds for those following the new program. Using a
critical value approach, what can the coach conclude at a 0.01 level of
significance?

46
Solution:

The sample variances are s12 = 1.052 = 1.1025 and s22 = 0.952 = 0.9025. Neither
sample variance is more than twice the other, which means we can assume
equal sample variances, and therefore equal population variances.

The null and alternative hypotheses for the upper-tailed test are

H0 : μ1 − μ2 ≤ 0

Ha : μ1 − μ2 > 0

Since both samples are larger than 30 and the population variances are
equal, we can calculate pooled standard deviation.

(n1 − 1)s12 + (n2 − 1)s22


sp =
n1 + n2 − 2

(50 − 1)1.052 + (50 − 1)0.952


sp =
50 + 50 − 2

49(1.052) + 49(0.952)
sp =
98

1.052 + 0.952
sp =
2

sp ≈ 1.00125

Then the z-statistic is

47
x̄1 − x̄2
z=
1 1
sp n1
+ n2

6.12 − 5.5
z=
1 1
1.00125 50
+ 50

0.62
z=
1
1.00125 25

z ≈ 3.10

For an upper-tailed test and α = 0.01, the critical z-value is z = 2.33. Since
3.10 > 2.33, the coach can reject the null hypothesis and conclude that the
new weight loss program is more effective than the old program.

3. Test the claim that, in 2006, the mean weight of men in the US was not
significantly different from the mean weight of women. Previous research
showed population standard deviations were 10.25 pounds for men and
8.58 pounds for women. A random sample of 1,500 men has a mean weight
of 193.5 pounds and a random sample of 1,500 women has a mean weight
of 185.3 pounds. Assuming the population variances are unequal, use a p
-value approach to formulate a decision at the 0.05 significance level.

Solution:

48
Given the sample mean μm = 193.5 and population standard deviation
σm = 10.25 for men, and the sample mean μw = 185.3 and population
standard deviation σw = 8.58 for women, the null and alternative
hypotheses for the two-tailed test will be

H0 : μm − μw = 0

Ha : μm − μw ≠ 0

With unequal population variances and large samples, the test statistic will
be

x̄1 − x̄2
z=
σ12 σ22
n1
+ n2

193.5 − 185.3
z=
10.252 8.582
1,500
+ 1,500

8.2
z=
105.0625 + 73.6164
1,500

1,500
z = 8.2
105.0625 + 73.6164

z ≈ 23.76

The test statistic is much larger than the largest z-value in the z-table, so
we could say that the probability of finding z ≈ 23.76 is almost 0. Therefore,

49
p ≤ α and we can reject the null hypothesis, concluding that the mean
weight of men and women was significantly different.

4. A research team wants to determine whether men and women drink a


different amount of water each day. They randomly sample 25 men and 25
women and find that the men consumed 1.48 liters of water with a
standard deviation of 0.13 liters, and that the women consumed 1.62 liters
of water with a standard deviation of 0.20 liters. Using a critical value
approach, what can the research team conclude at a 0.10 level of
significance?

Solution:

The null and alternative hypotheses for the two-tailed test will be

H0 : μm − μw = 0

Ha : μm − μw ≠ 0

With small samples and unequal population variances (s22 = 0.22 = 0.04 is
more than twice s12 = 0.132 = 0.0169), the t-statistic is

x̄1 − x̄2
t=
s12 s22
n1
+ n2

50
1.48 − 1.62
t=
0.0169 0.04
25
+ 25

−0.14
t=
0.0569
25

t ≈ − 2.9346

The number of degrees of freedom will be

( n1 n2 )
2
s12 s22
+
df =

( n1 ) ( n2 )
2 2
1 s12 1 s22
n1 − 1
+ n2 − 1

( 25 25 )
2
0.0169 0.04
+
df =
( 25 ) ( 25 )
2 2
1 0.0169 1 0.04
25 − 1
+ 25 − 1

df ≈ 27.9966

With df = 27 and α = 0.10, we find critical t-values of t ± 1.703. Since


−2.9346 < − 1.703, the research team can reject the null hypothesis and
conclude that there’s a difference in the mean amount of water that men
and women drink each day.

51
5. Given x̄1 = 23.55 and x̄2 = 20.12 with s1 = 2.3, s2 = 2.9, n1 = 10, and n2 = 15,
determine whether the two population means differ significantly. Using a
critical value approach, and assuming population standard deviations are
unequal, what can we conclude at a 0.01 level of significance?

Solution:

We want to determine whether there’s a difference in population means,


so we need to use a two-tailed test, and our hypothesis statements will be

H0 : μ1 − μ2 = 0

Ha : μ1 − μ2 ≠ 0

With small samples and unequal population variances, we should calculate


a t-statistic.

x̄1 − x̄2
t=
s12 s22
n1
+ n2

23.55 − 20.12
t=
2.32 2.92
10
+ 15

3.43
t=
5.29 8.41
10
+ 15

t ≈ 3.286

52
Calculate the number of degrees of freedom.

( n1 n2 )
2
s12 s22
+
df =

( n1 ) ( n2 )
2 2
1 s12 1 s22
n1 − 1
+ n2 − 1

( 10 15 )
2
2.32 2.92
+
df =
2.34 2.94
+
102(10 − 1) 152(15 − 1)

df ≈ 22.17

Rounding down to the nearest whole number gives df = 22. From the t
-table, we find that the critical t-value for the two-tailed test with α = 0.01
and df = 22 is t = 2.819.

Because 3.286 > 2.819, we can reject the null hypothesis and conclude that
there’s a significant difference in population means.

6. John claims that the temperature in July is higher than the


temperature in August. He recorded the temperature daily at 12 : 00 p.m.
throughout July and August. He found a mean temperature of 28.4∘ C with
a standard deviation of 2.1∘ C in July, and a mean temperature of 27.3∘ C
with a standard deviation of 1.7∘ C in in August. Using a critical value
approach and assuming the population variances are unequal, what can
John conclude at a 0.05 level of significance?

53
Solution:

Using μ1, x̄1, and s1 for July and μ2, x̄2, and s2 for August, the hypothesis
statements for the John’s upper-tailed test will be

H0 : μ1 − μ2 ≤ 0

Ha : μ1 − μ2 > 0

With large samples and unequal population variances, John’s test statistic
will be

x̄1 − x̄2
z=
s12 s22
n1
+ n2

28.4 − 27.3
z=
2.12 1.72
31
+ 31

31
z = 1.1
4.41 + 2.89

z ≈ 2.27

The critical z-value for α = 0.05 with an upper-tailed test is z = 1.65. Since
2.27 > 1.65, John can reject the null hypothesis and conclude that the mean
temperature in July is higher than in August.

54
MATCHED-PAIR HYPOTHESIS TESTING

1. A golf club manufacturer claims that their new driver delivers 15 yards
of extra driving distance. They record the before and after driving
distances of 10 top professional players.

Player 1 2 3 4 5 6 7 8 9 10

Before x1 303 308 295 305 301 312 287 294 300 301

After x2 307 320 297 315 305 316 299 302 307 315

Difference,
4 12 2 10 4 4 12 8 7 14
d

d2 16 144 4 100 16 16 144 64 49 196

Can the manufacturer conclude at a 5 % significance level that their driver


delivers 15 yards of extra driving distance?

Solution:

The manufacturer will define the “before” responses as Population 1, and


the “after” responses as Population 2, and their null and alternative
hypotheses will be

H0 : μ2 − μ1 ≤ 15

Ha : μ2 − μ1 > 15

55
where μ1 is the mean driving distance with the players’ current drivers, and
μ2 is the mean driving distance with the manufacturer’s new driver. And
because μ2 − μ1 is the difference in distance, the hypothesis statements
could also be written as

H0 : μd ≤ 15

Ha : μd > 15

where μd is the mean difference between the two populations.

To find the mean difference, we’ll sum the differences and divide by the
number of matched-pairs in our sample, n = 10.
n
∑i=1 di 4 + 12 + 2 + 10 + 4 + 4 + 12 + 8 + 7 + 14 77
d¯ = = = = 7.7
n 10 10

So the sample mean tells us that mean distance gained is 7.7 yards. Then
the sample standard deviation is

∑i=1 (di − d¯)2


n

sd =
n−1

To calculate this, we’ll first find


n
(di − d¯)2

i=1

(4 − 7.7)2 + (12 − 7.7)2 + (2 − 7.7)2 + (10 − 7.7)2 + (4 − 7.7)2

+(4 − 7.7)2 + (12 − 7.7)2 + (8 − 7.7)2 + (7 − 7.7)2 + (14 − 7.7)2

56
(−3.7)2 + 4.32 + (−5.7)2 + 2.32 + (−3.7)2 + (−3.7)2 + 4.32 + 0.32 + (−0.7)2 + 6.32

13.69 + 18.49 + 32.49 + 5.29 + 13.69 + 13.69 + 18.49 + 0.09 + 0.49 + 39.69

156.1

Then the sample standard deviation is

156.1
sd =
9

sd ≈ 17.34

sd ≈ 4.165

Because the population standard deviations are unknown, and/or because


both sample sizes are small, n1, n2 < 30, the test statistic will be

d¯ − μd
t= sd
n

7.7 − 15
t≈ 4.165
10

10
t ≈ − 7.3 ⋅
4.165

t ≈ − 5.543

and the degrees of freedom are

df = n − 1 = 10 − 1 = 9

57
At a significance level of 5 % (a confidence level of 95 % for an upper-tailed
test), and df = 9, the t-table gives 1.833.

Upper-tail probability p

df 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005

8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 4.501 5.041

9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.297 4.781

10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 4.144 4.587

50% 60% 70% 80% 90% 95% 98% 99% 99.8% 99.9%

Confidence level C

The manufacturer’s t-test statistic t ≈ − 5.543 doesn’t meet the threshold


t = 1.833, so the critical value approach tells them that they can’t reject the
null hypothesis, and therefore can’t conclude that their new driver adds 15
yards of extra distance for the professional players.

2. A car company believes that the changes they’ve made to their hybrid
engine will increase miles per gallon by 4. They send out one car with the
old engine and one car with the new engine to drive the same route, and
record the miles per gallon of each pair of cars.

Route 1 2 3 4 5 6 7 8 9 10

Old engine 39 39 38 42 44 43 42 47 47 47
New
50 49 45 46 46 41 42 43 43 49
engine
Difference,
11 10 7 4 2 -2 0 -4 -4 2
d

d2 121 100 49 16 4 4 0 16 16 4

58
Can the car company conclude at a 1 % significance level that the changes
they’ve made to the hybrid engine deliver 4 extra miles per gallon?

Solution:

The car company will define the values for the old engine as Population 1,
and the values for the new engine as Population 2, and their null and
alternative hypotheses will be

H0 : μ2 − μ1 ≤ 4

Ha : μ2 − μ1 > 4

where μ1 is the miles per gallon obtained by the old engine, and μ2 is the
miles per gallon obtained by the old engine. And because μ2 − μ1 is the
difference in miles per gallon, the hypothesis statements could also be
written as

H0 : μd ≤ 4

Ha : μd > 4

where μd is the mean difference between the two populations.

To find the mean difference, we’ll sum the differences and divide by the
number of matched-pairs in our sample, n = 10.
n
∑i=1 di 11 + 10 + 7 + 4 + 2 + (−2) + 0 + (−4) + (−4) + 2 26
d¯ = = = = 2.6
n 10 10

59
So the sample mean tells us that mean difference is 2.6 miles per gallon.
Then the sample standard deviation is

∑i=1 (di − d¯)2


n

sd =
n−1

To calculate this, we’ll first find


n
(di − d¯)2

i=1

(11 − 2.6)2 + (10 − 2.6)2 + (7 − 2.6)2 + (4 − 2.6)2 + (2 − 2.6)2

+(−2 − 2.6)2 + (0 − 2.6)2 + (−4 − 2.6)2 + (−4 − 2.6)2 + (2 − 2.6)2

8.42 + 7.42 + 4.42 + 1.42 + (−0.6)2

+(−4.6)2 + (−2.6)2 + (−6.6)2 + (−6.6)2 + (−0.6)2

70.56 + 54.76 + 19.36 + 1.96 + 0.36 + 21.16 + 6.67 + 43.56 + 43.56 + 0.36

262.31

Then the sample standard deviation is

262.31
sd =
9

sd ≈ 29.15

sd ≈ 5.399

60
Because the population standard deviations are unknown, and/or because
both sample sizes are small, n1, n2 < 30, the test statistic will be

d¯ − μd
t= sd
n

2.6 − 4
t≈ 5.399
10

10
t ≈ − 1.4 ⋅
5.399

t ≈ − 0.82

and the degrees of freedom are

df = n − 1 = 10 − 1 = 9

At a significance level of 1 % (a confidence level of 99 % ) for an upper-


tailed test, and df = 9, the t-table gives 2.821.

Upper-tail probability p

df 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005

8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 4.501 5.041

9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.297 4.781

10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 4.144 4.587

50% 60% 70% 80% 90% 95% 98% 99% 99.8% 99.9%

Confidence level C

The car company’s t-test statistic t ≈ − 0.82 doesn’t meet the threshold
t = 2.821, so the critical value approach tells them that they can’t reject the

61
null hypothesis, and therefore can’t conclude that their new engine adds 4
miles per gallon.

3. We want to test the claim that listening to classical music while


studying makes students complete their homework faster. We ask 10
students to study in silence for the first semester, and study with classical
music for the second semester, then we record the mean number of hours
spent on homework per week in each semester.

Student 1 2 3 4 5 6 7 8 9 10

In silence 14 13 16 21 15 19 11 20 19 16
With
12 13 15 22 16 19 8 17 18 17
music
Difference,
2 0 1 -1 -1 0 3 3 1 -1
d

d2 4 0 1 1 1 0 9 9 1 1

Can we conclude at a 10 % significance level that studying with classical


music reduces the number of hours spent per week on homework?

Solution:

We’ll define the hours spent studying in silence as Population 1, and the
hours spent studying with classical music as Population 2, and our null and
alternative hypotheses will be

H0 : μ1 − μ2 ≤ 0

62
Ha : μ1 − μ2 > 0

where μ1 is the mean number hours spent studying in silence, and μ2 is the
mean number of hours spent studying with classical music. And because
μ1 − μ2 is the difference in study time, the hypothesis statements could also
be written as

H0 : μd ≤ 0

Ha : μd > 0

where μd is the mean difference between the two populations.

To find the mean difference, we’ll sum the differences and divide by the
number of matched-pairs in our sample, n = 10.
n
∑i=1 di 2 + 0 + 1 + (−1) + (−1) + 0 + 3 + 3 + 1 + (−1) 7
d¯ = = = = 0.7
n 10 10

So the sample mean tells us that mean difference is 0.7 studying hours.
Then the sample standard deviation is

∑i=1 (di − d¯)2


n

sd =
n−1

To calculate this, we’ll first find


n
(di − d¯)2

i=1

(2 − 0.7)2 + (0 − 0.7)2 + (1 − 0.7)2 + (−1 − 0.7)2

63
+(−1 − 0.7)2 + (0 − 0.7)2 + (3 − 0.7)2 + (3 − 0.7)2

+(1 − 0.7)2 + (−1 − 0.7)2

1.32 + (−0.7)2 + 0.32 + (−1.7)2 + (−1.7)2 + (−0.7)2 + 2.32 + 2.32 + 0.32 + (−1.7)2

1.69 + 0.49 + 0.09 + 2.89 + 2.89 + 0.49 + 5.29 + 5.29 + 0.09 + 2.89

22.1

Then the sample standard deviation is

22.1
sd =
9

sd ≈ 2.46

sd ≈ 1.567

Because the population standard deviations are unknown, and/or because


both sample sizes are small, n1, n2 < 30, the test statistic will be

d¯ − μd
t= sd
n

0.7 − 0
t= 1.567
10

10
t = 0.7 ⋅
1.567

t ≈ 1.413

64
and the degrees of freedom are

df = n − 1 = 10 − 1 = 9

At a significance level of 10 % (a confidence level of 90 % ) for an upper-


tailed test, and df = 9, the t-table gives 1.383.

Upper-tail probability p

df 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005

8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 4.501 5.041

9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.297 4.781

10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 4.144 4.587

50% 60% 70% 80% 90% 95% 98% 99% 99.8% 99.9%

Confidence level C

Our t-test statistic t ≈ 1.413 meets the threshold t = 1.383, so the critical
value approach tells us that we can reject the null hypothesis, and
therefore conclude that studying with classical music reduces time spent
on homework.

4. A clothing store wants to test the claim that customers who join their
VIP program return less merchandise. They track the mean monthly
merchandise returns of 10 customers for one year before and after joining
the VIP program, then record the mean returns per month.

Customer 1 2 3 4 5 6 7 8 9 10

Before VIP 12 55 48 23 97 103 33 44 17 29

After VIP 15 44 35 20 100 97 30 41 24 40

65
Can they conclude at a 5 % significance level that joining the VIP program
reduces the amount of merchandise returns?

Solution:

The clothing store will define the values for returns before enrolling in the
VIP program as Population 1, and the values for returns after enrolling in
the VIP program as Population 2, and their null and alternative hypotheses
will be

H0 : μ1 − μ2 ≤ 0

Ha : μ1 − μ2 > 0

where μ1 is the mean monthly merchandise returns before the VIP


program, and μ2 is the mean monthly merchandise returns after the VIP
program. And because μ1 − μ2 is the difference in monthly merchandise
returns, the hypothesis statements could also be written as

H0 : μd ≤ 0

Ha : μd > 0

where μd is the mean difference between the two populations.

To find the mean difference, we’ll sum the differences and divide by the
number of matched-pairs in our sample, n = 10.
n
∑i=1 di (−3) + 11 + 13 + 3 + (−3) + 6 + 3 + 3 + (−7) + (−11) 15
d¯ = = = = 1.5
n 10 10

66
So the sample mean tells us that mean difference is 1.5 in merchandise
returns. Then the sample standard deviation is

∑i=1 (di − d¯)2


n

sd =
n−1

To calculate this, we’ll first find


n
(di − d¯)2

i=1

(−3 − 1.5)2 + (11 − 1.5)2 + (13 − 1.5)2 + (3 − 1.5)2

+(−3 − 1.5)2 + (6 − 1.5)2 + (3 − 1.5)2 + (3 − 1.5)2

+(−7 − 1.5)2 + (−11 − 1.5)2

(−4.5)2 + 9.52 + 11.52 + 1.52 + (−4.5)2

+4.52 + 1.52 + 1.52 + (−8.5)2 + (−12.5)2

20.25 + 90.25 + 132.25 + 2.25 + 20.25 + 20.25 + 2.25 + 2.25 + 72.25 + 156.25

518.5

Then the sample standard deviation is

518.5
sd =
9

sd ≈ 57.61

sd ≈ 7.59

67
Because the population standard deviations are unknown, and/or because
both sample sizes are small, n1, n2 < 30, the test statistic will be

d¯ − μd
t= sd
n

1.5 − 0
t≈ 7.59
10

10
t ≈ 1.5 ⋅
7.59

t ≈ 0.625

and the degrees of freedom are

df = n − 1 = 10 − 1 = 9

At a significance level of 5 % (a confidence level of 95 % ) for an upper-


tailed test, and df = 9, the t-table gives 1.833.

Upper-tail probability p

df 0.25 0.20 0.15 0.10 0.05 0.025 0.01 0.005 0.001 0.0005

8 0.706 0.889 1.108 1.397 1.860 2.306 2.896 3.355 4.501 5.041

9 0.703 0.883 1.100 1.383 1.833 2.262 2.821 3.250 4.297 4.781

10 0.700 0.879 1.093 1.372 1.812 2.228 2.764 3.169 4.144 4.587

50% 60% 70% 80% 90% 95% 98% 99% 99.8% 99.9%

Confidence level C

The clothing store’s t-test statistic t ≈ 0.625 doesn’t meet the threshold
t = 1.833, so the critical value approach tells them that they can’t reject the

68
null hypothesis, and therefore can’t conclude that their VIP program
causes customers to return less merchandise.

5. If the mean difference is d¯ = 10 on a sample of n = 25 with sample


¯
standard deviation sd = 2.5, calculate the 95 % confidence interval around d.

Solution:

For 95 % confidence with df = n − 1 = 25 − 1 = 24, and because we have a


small sample n < 30, the confidence interval will be
sd
(a, b) = d¯ ± tα/2
n

( 25 )
2.5
(a, b) = 10 ± 2.064

(a, b) = 10 ± 2.064(0.5)

(a, b) = 10 ± 1.032

Then the confidence interval will be

(a, b) = (10 − 1.032,10 + 1.032)

(a, b) = (8.968,11.032)

So we’re 95 % confident that the mean difference falls between 8.968 and
11.032.

69
6. If the mean difference is d¯ = 24 on a sample of n = 49 with population
¯
standard deviation σd = 3.2, calculate the 99 % confidence interval around d.

Solution:

For 99 % confidence with critical values z = ± 2.58 and a large sample n ≥ 30,
the confidence interval will be
σd
(a, b) = d¯ ± zα/2
n

( 49 )
3.2
(a, b) = 24 ± 2.58

(a, b) ≈ 24 ± 2.58(0.457)

(a, b) ≈ 24 ± 1.179

Then the confidence interval will be

(a, b) ≈ (24 − 1.179,24 + 1.179)

(a, b) ≈ (22.821,25.179)

So we’re 99 % confident that the mean difference falls between 22.821 and
25.179.

70
CONFIDENCE INTERVAL FOR THE DIFFERENCE OF PROPORTIONS

1. Given x1 = 54 successes in the first sample n1 = 150, and x2 = 47


successes in the second sample n2 = 160, calculate a 95 % confidence
interval.

Solution:

The sample proportions are

x1 54
p1̂ = = = 0.36
n1 150

x2 47
p2̂ = = ≈ 0.294
n2 160

At 95 % confidence, the critical z-values are z = ± 1.96, so the confidence


interval will be

p1̂ (1 − p1̂ ) p2̂ (1 − p2̂ )


(a, b) = ( p1̂ − p2̂ ) ± zα/2 +
n1 n2

0.36(1 − 0.36) 0.294(1 − 0.294)


(a, b) ≈ (0.36 − 0.294) ± 1.96 +
150 160

0.36(0.64) 0.294(0.706)
(a, b) ≈ 0.066 ± 1.96 +
150 160

(a, b) ≈ 0.066 ± 0.104

71
Then the 95 % confidence interval is

(a, b) ≈ (0.066 − 0.104,0.066 + 0.104)

(a, b) ≈ (−0.038,0.170)

Because the confidence interval includes 0, we can’t conclude that there’s


a difference between the population proportions.

2. A light bulb manufacturer wants to know whether their own bulbs last
longer than a competitor’s bulb. They randomly sampled 150 people who
bought their bulb, and 72 of them reported that it lasted longer than 250
days. They randomly sampled 150 people who bought the competitor’s
bulb, and 69 of them reported that it lasted for more than 250 days. Find a
90 % confidence interval around the difference of proportions.

Solution:

The sample proportions are

72
p1̂ = = 0.48
150

69
p2̂ = = 0.46
150

At 90 % confidence, the critical z-values are z = ± 1.65, so the confidence


interval will be

72
p1̂ (1 − p1̂ ) p2̂ (1 − p2̂ )
(a, b) = ( p1̂ − p2̂ ) ± zα/2 +
n1 n2

0.48(1 − 0.48) 0.46(1 − 0.46)


(a, b) = (0.48 − 0.46) ± 1.65 +
150 150

0.48(0.52) 0.46(0.54)
(a, b) = 0.02 ± 1.65 +
150 150

(a, b) ≈ 0.02 ± 0.095

Then the 90 % confidence interval is

(a, b) ≈ (0.02 − 0.095,0.02 − 0.095)

(a, b) ≈ (−0.075,0.115)

Because the confidence interval includes 0, we can’t conclude that there’s


a difference between the proportion of bulbs from each company that last
longer than 250 days.

3. A research team wants to know whether Vitamin C shortens recovery


time from the common cold. They chose 100 patients with the common
cold and randomly assigned 50 of them to the Vitamin C treatment group
and 50 of them to the placebo group. In the Vitamin C group, 38 patients
recovered in less than 7 days, while 24 patients in the placebo group
recovered in less than 7 days. Find a 99 % confidence interval around the
difference in population proportions.

73
Solution:

The sample proportions are

38
p1̂ = = 0.76
50

24
p2̂ = = 0.48
50

At 99 % confidence, the critical z-values are z = ± 2.58, so the confidence


interval will be

p1̂ (1 − p1̂ ) p2̂ (1 − p2̂ )


(a, b) = ( p1̂ − p2̂ ) ± zα/2 +
n1 n2

0.76(1 − 0.76) 0.48(1 − 0.48)


(a, b) = (0.76 − 0.48) ± 2.58 +
50 50

0.76(0.24) 0.48(0.52)
(a, b) = 0.28 ± 2.58 +
50 50

(a, b) ≈ 0.28 ± 0.24

Then the 99 % confidence interval is

(a, b) ≈ (0.28 − 0.24,0.28 + 0.24)

(a, b) ≈ (0.04,0.52)

We can be 99 % confident that the true difference between population


proportions is between 0.04 and 0.52. Which means we can be 99 %

74
confident that the Vitamin C treatment shortens recovery time from the
common cold.

4. A researcher randomly chose 900 smokers, 450 men and 450 women.
He found that 357 of the male smokers have been diagnosed with coronary
artery disease, while 295 of the female smokers have been diagnosed with
coronary artery disease. Construct a 95 % confidence interval to estimate
the difference between the proportions of male and female smokers who
have been diagnosed with coronary artery disease.

Solution:

The sample proportions are

357
p1̂ = ≈ 0.793
450

295
p2̂ = ≈ 0.656
450

At 95 % confidence, the critical z-values are z = ± 1.96, so the confidence


interval will be

p1̂ (1 − p1̂ ) p2̂ (1 − p2̂ )


(a, b) = ( p1̂ − p2̂ ) ± zα/2 +
n1 n2

0.793(1 − 0.793) 0.656(1 − 0.656)


(a, b) = (0.793 − 0.656) ± 1.96 +
450 450

75
0.793(0.207) 0.656(0.344)
(a, b) = 0.137 ± 1.96 +
450 450

(a, b) ≈ 0.137 ± 0.058

Then the 95 % confidence interval is

(a, b) ≈ (0.137 − 0.058,0.137 + 0.058)

(a, b) ≈ (0.079,0.195)

We can be 95 % confident that the true difference between the proportion


of male smokers with coronary artery disease and the proportion of
female smokers with coronary artery disease is between 0.079 and 0.195.

5. In a simple random sample of 1,000 people aged 20 − 24, 7 % said they


ran at least one marathon in the last year. In a simple random sample of
1,200 people aged 25 − 29, 12 % said they ran at least one marathon in the
last year. Find a 99 % confidence interval around the difference of
population proportions.

Solution:

With p1̂ = 0.07 for n1 = 1,000 and p2̂ = 0.12 for n2 = 1,200, and critical values of
z = ± 2.58 for a 99 % confidence level, the confidence interval will be

p1̂ (1 − p1̂ ) p2̂ (1 − p2̂ )


(a, b) = ( p1̂ − p2̂ ) ± zα/2 +
n1 n2

76
0.07(1 − 0.07) 0.12(1 − 0.12)
(a, b) = (0.07 − 0.12) ± 2.58 +
1,000 1,200

0.07(0.93) 0.12(0.88)
(a, b) = − 0.05 ± 2.58 +
1,000 1,200

(a, b) ≈ − 0.05 ± 0.032

Then the 99 % confidence interval is

(a, b) ≈ (−0.05 − 0.032, − 0.05 + 0.032)

(a, b) ≈ (−0.082, − 0.018)

We can be 99 % confident that the true of proportions is between −0.082


and −0.018. Which means it’s likely that more people aged 25 − 29 ran at
least one marathon in the last year, compared to people aged 20 − 24.

6. In a simple random sample of 280 Masters students from one


university, 24 said they planned to pursue a PhD. In a simple random
sample of 350 Masters students at a second university, 34 said they
planned to pursue a PhD. Build a 98 % confidence interval around the
difference of proportions.

Solution:

The sample proportions are

77
24
p1̂ = ≈ 0.086
280

34
p2̂ = ≈ 0.097
350

At 98 % confidence, the critical z-values are z = ± 2.33, so the confidence


interval will be

p1̂ (1 − p1̂ ) p2̂ (1 − p2̂ )


(a, b) = ( p1̂ − p2̂ ) ± zα/2 +
n1 n2

0.086(1 − 0.086) 0.097(1 − 0.097)


(a, b) = (0.086 − 0.097) ± 2.33 +
280 350

0.086(0.914) 0.097(0.903)
(a, b) = − 0.011 ± 2.33 +
280 350

(a, b) ≈ − 0.011 ± 0.054

Then the 98 % confidence interval is

(a, b) ≈ (−0.011 − 0.054, − 0.011 + 0.054)

(a, b) ≈ (−0.065,0.043)

Because the confidence interval includes 0, we can’t conclude that there’s


a difference between the proportion of Masters students at each
university who want to pursue a PhD.

78
HYPOTHESIS TESTING FOR THE DIFFERENCE OF PROPORTIONS

1. We defined the hypothesis statements below, and then found sample


proportions of p1̂ = 0.456 for n1 = 278 and p2̂ = 0.384 for n2 = 310. Using a
critical value approach, can we reject the null hypothesis at a confidence
level of 95 % ?

H0 : p1 − p2 ≤ 0

Ha : p1 − p2 > 0

Solution:

The pooled proportion is

p1̂ n1 + p2̂ n2
p̂ =
n1 + n2

0.456(278) + 0.384(310)
p̂ =
278 + 310

p̂ ≈ 0.418

Then the z-statistic is

p1̂ − p2̂
z=
̂ − p)̂ ( n1 +
p(1 n2 )
1
1

79
0.456 − 0.384
z≈
0.418(1 − 0.418)( 278 310 )
1 1
+

0.072
z≈
0.418(0.582)( 278 310 )
1 1
+

z ≈ 1.767

For an upper-tailed test at a confidence level of 95 % , the critical value is


z = 1.65. Since 1.767 > 1.65, we can reject the null hypothesis and conclude
that p1 > p2 at a 95 % confidence level.

2. Given the hypothesis statements below, x1 = 234 with n1 = 1,150 and


x2 = 327 with n2 = 1,320, calculate the test statistic.

H0 : p1 − p2 = 0

Ha : p1 − p2 ≠ 0

Solution:

First calculate the sample proportions.

x1 234
p1̂ = = ≈ 0.203
n1 1,150

80
x2 327
p2̂ = = ≈ 0.248
n2 1,320

The pooled proportion is

p1̂ n1 + p2̂ n2
p̂ =
n1 + n2

0.203(1,150) + 0.248(1,320)
p̂ =
1,150 + 1,320

p̂ ≈ 0.227

Then the z-statistic is

p1̂ − p2̂
z=
̂ − p)̂ ( n1 +
p(1 n2 )
1
1

0.203 − 0.248
z≈
0.227(1 − 0.227)( 1,150 1,320 )
1 1
+

−0.045
z≈
0.227(0.773)( 1,150 1,320 )
1 1
+

z ≈ − 2.663

81
3. A cinema owner wants to know whether there’s a difference in the
number of boys and girls who watched a new movie last week. She
randomly sampled 76 boys and 75 girls and found that 45 boys and 58 girls
watched the movie. What can she conclude about the difference of
proportions at a 99 % confidence level?

Solution:

The owner is running a two-tailed test, so her hypothesis statements will


be

H0 : p1 − p2 = 0

Ha : p1 − p2 ≠ 0

The sample proportions are

x1 45
p1̂ = = ≈ 0.592
n1 76

x2 58
p2̂ = = ≈ 0.773
n2 75

Then the pooled proportion is

45 + 58
p̂ =
76 + 75

p̂ ≈ 0.682

and the test statistic is

82
p1̂ − p2̂
z=
̂ − p)̂ ( n1 +
p(1 n2 )
1
1

0.592 − 0.773
z=
0.682(1 − 0.682)( 76 75 )
1 1
+

−0.181
z=
0.682(0.318)( 76 75 )
1 1
+

z ≈ − 2.39

For a two-tailed test at 99 % confidence, the critical value will be z = − 2.58.


Because −2.58 < − 2.39, the cinema owner fails to reject the null hypothesis,
and can’t conclude that more boys than girls watched the new movie last
week.

4. A store owner believes that women spend at least 22 % more in his


store than men. He randomly chooses 64 visitors, 32 men and 32 women,
and finds that 14 men spent more than $100, while 23 women spent more
than $100. Using a p-value approach, what can he conclude at a 90 %
confidence level?

Solution:

83
Assuming p1 is the population proportion of women who spend more than
$100 and p2 is the population proportion of men who spend more than $100,
the store owner’s hypothesis statements for the upper-tailed test will be

H0 : p1 − p2 ≤ 0.22

Ha : p1 − p2 > 0.22

The sample proportions are

x1 23
p1̂ = = ≈ 0.7188
n1 32

x2 14
p2̂ = = = 0.4375
n2 32

The pooled proportion is

x1 + x2
p̂ =
n1 + n2

23 + 14
p̂ =
32 + 32

p̂ ≈ 0.578

Then the test statistic is

( p1̂ − p2̂ ) − (p1 − p2)


z=
̂ − p)̂ ( n1 +
p(1 n2 )
1
1

84
(0.7188 − 0.4375) − 0.22
z≈
0.578(1 − 0.578)( 32 32 )
1 1
+

0.0613
z≈
0.578(0.422)( 16 )
1

z ≈ 0.50

This z-value gives

z .00 .01 .02 .03 .04 .05 .06 .07 .08 .09

0.4 .6554 .6591 .6628 .6664 .6700 .6736 .6772 .6808 .6844 .6879

0.5 .6915 .6950 .6985 .7019 .7054 .7088 .7123 .7157 .7190 .7224

0.6 .7257 .7291 .7324 .7357 .7389 .7422 .7454 .7486 .7517 .7549

we can see that the area to the left of z = 0.50 is 0.6915. Because this is an
upper-tailed test, we’re interested in the area to the left of z = 0.50, so
1 − 0.6915 = 0.3085. Therefore, p = 0.3085. Since 0.3085 > 0.1, the store owner
fails to reject the null hypothesis. There’s not enough evidence to
conclude that women spend 22 % more than men.

5. In a random sample of 60 people under the age of 30, 14 % said they’re


planning to go hiking next month. In a random sample of 75 people older
than 50, 23 % said they’re planning to go hiking next month. Using a critical
value approach at a 95 % confidence level, is there enough evidence to
conclude that a higher proportion of people over age 50 plan to go hiking
next month than the proportion of people under 30 who plan to go hiking?

85
Solution:

If p1 is the proportion of people under 30 who plan to hike, and p2 is the


proportion of people over 50 who plan to hike, then the hypothesis
statements are

H0 : p1 − p2 ≥ 0

Ha : p1 − p2 < 0

The sample proportions are

p1̂ = 0.14

p2̂ = 0.23

The pooled proportion is

0.14(60) + 0.23(75)
p̂ =
60 + 75

p̂ = 0.19

Then the test statistic is

p1̂ − p2̂
z=
̂ − p)̂ ( n1 +
p(1 n2 )
1
1

86
0.14 − 0.23
z=
0.19(1 − 0.19)( 60 75 )
1 1
+

−0.09
z=
0.19(0.81)( 60 75 )
1 1
+

z = − 1.325

For a 95 % confidence interval and a lower-tailed test, the critical z-value is


z = − 1.65. Since −1.325 > − 1.65, we fail to reject the null hypothesis. There’s
not enough evidence to conclude that the proportion of people over 50
who plan to hike is higher than the proportion of people under 30 who plan
to hike.

6. John and Steven are two fitness trainers who want to compare their
client satisfaction rate. John chose a random sample of 85 clients and
Steven chose a random sample of 72 clients. John found that 89 % of his
clients were satisfied and Steve found that 91 % of his clients were
satisfied. Using a critical value approach at a 95 % confidence level, is there
a significant difference between proportions?

Solution:

John and Steven are running a two-tailed test, so their hypothesis


statements are

87
H0 : p1 − p2 = 0

Ha : p1 − p2 ≠ 0

The sample proportions are

p1̂ = 0.89

p2̂ = 0.91

The pooled proportion is

p1̂ n1 + p2̂ n2
p̂ =
n1 + n2

0.89(85) + 0.91(72)
p̂ =
85 + 72

p̂ ≈ 0.899

Then the test statistic is

0.89 − 0.91
z=
0.899(1 − 0.899)( 85 72 )
1 1
+

−0.02
z=
0.899(0.101)( 85 72 )
1 1
+

z ≈ − 0.414

88
For a 95 % confidence level and a two-tailed test, the critical z-values are
z = ± 1.96. Since −0.414 falls between −1.96 and 1.96, we fail to reject the null
hypothesis. There’s not enough evidence to conclude that there’s a
significant difference between John and Steven’s client satisfaction rate.

89
90

You might also like