Hypothesis Testing For A Single Sample
Hypothesis Testing For A Single Sample
• The procedure and the reasoning used to reach a decision in a hypothesis test.
Intro to parameter).
Hypoth
• The test assesses the evidence provided by the
esis
Testing data against the null claim (the claim about the
• Its one of the values of the sample statistic that are likely to occur weak evidence). Seen in figure 1.
• Its one of the values of the sample statistic that is unlikely to occur strong evidence). Seen in fig 2.
Figure 1
Figure 2
4
Setting the Stage Class Example 1: The General Procedure
i. The underlying distribution of x is normal and the population standard deviation, is known
(remember this is often not the case).
ii. Also
If this is the case, then has a normal distribution with a test statistic:
Definition: Test Statistic: A test statistic is a value computed using sample data. It is the value
• Some examples for normal based methods include the test statistics.
5
Class Example 1
Suppose a company makes a claim about one of its product. For example, Dannon claims that
their peanut butter banana flavored Oikios Triple Zero yogurt has 100 calories per cup.
This constitutes step 1 of statistical hypothesis testing, namely we have 2 claims about what
is going on in the population, one of which is our stance for the testing.
6
Hypothesis statements:
• Think of this as a status quo. Can be a claim or come a past historical value.
hypotheses.
• The null hypothesis, , normally states that the parameter of interest equals a specified or
hypothesized value.
• We can have inequalities applied (for example, >, or <), but we will not see that here.
• The alternate hypothesis, , states that the parameter is less than, greater than, or simply
For example: It's standard knowledge that IQ tests are designed to be normally
To determine the legitimacy of Dannon’s claim and hopefully support our belief, we obtain a
random sample, say of 64 cups, and calculate the sample mean (summarize the sample).
• This constitutes step 2 of statistical hypothesis testing, where we obtain a random sample,
• Very often we will see the normal distribution as the underlying population distribution.
• However, we must note whether we know the population standard deviation or not
11
From our sample, the following question arises: Question: Based on this random sample, do we have
For example, say the sample data found the mean calories per cup to be 104.
enough evidence to say the mean number of calories in a cup of Dannon’s yogurt is more than 100
• This is more than 100 but we must consider this is just a sample of yogurts - it isn't every yogurt.
calories? Well what can happen?
It's possible that the yogurts that just happened to be in our sample were those with a slightly
• If the sample mean is 100 calories we have no evidence that Dannon has made a false claim. If it is
higher calorie count. Maybe this could just happen randomly.
less it refutes our alternative claim that the mean is more than 100.
• To determine if it really is that different from the population mean, we need to find out how
• Even if the sample mean is greater than 100 calories, it is possible that Dannon is still telling the truth.
probable a sample mean of 104 calories would be if the true population mean was 100.
• However, at some point, say a sample average of 150 calories per cup, it will be clear that Dannon has
• This constitutes step 3 of statistical hypothesis testing, where we figure out how likely it is to
no hope of claiming that they are telling the truth about their product.
observe data like the data we obtained given that Dannon’s claim, the null claim, is true.
12
P-Values
13
P-Values
Figure 1
i. Upper-tailed test:
i. Lower-tailed test:
i. Two-tailed test:
14
Class Example 1 Continued
To
Thisanswer
meansthis,
thatfrom
aboutchapter 8 we random
14 in 10000 know that the sampling
samples will havedistribution
a mean thatoflarge.
, is approximately normal,
So, we have two since the
conclusions:
sample
• Wesize is greater
just observedthan 30 , and thus
an extremely rarewe can or
event evoke the central
unusual limit
event, or thetheorem. Consequentially,
mean calories in Oikios Triple Zero is
• greater
and than 100 (thereby rejecting the claim).
• Let’s
This isassume that
the idea we know
behind the population
hypothesis standard
testing: a rare deviation,
or surprising . Thus,
event . With
signifies this information,
statistically the
significant
probability
evidence andofaobserving a random
change and sample
shift from with quo.
the status a mean of 104 if the population mean is 100 is about
• (greater than thethe
This constitutes population
final step mean so it ishypothesis
of statistical right tail (direction of inequality
testing, where we make sign)):
our decision (does the
observed data provide statistically significant strong evidence to reject Dannon’s claim, or does it fail to
reject the claim), and provide an interpretation within the context of the problem.
The decision is based on comparing the P-value and level of significance we are
testing at, we conclude our test by comparing the two as follows (Note the
• We reject
• We fail to reject
16
Class Example 2: Innocent Until Proven Guilty
Let’s consider a criminal trial. What is the null hypothesis? What is the alternative hypothesis?
• The alternative hypothesis is what we are trying to determine if the sample evidence supports (a and thus reject the
null):
• To determine which hypothesis is correct, the jury will listen to the evidence.
Only if there is “evidence beyond a reasonable doubt” would the null hypothesis be rejected in favor of the alternative
hypothesis. If we do NOT have convincing evidence, then we would “fail to reject” the null hypothesis.
• Remember that the verdict that is returned is “Guilty” or “Not Guilty” (there is no verdict of “innocence”).
• We never end up determining the null hypothesis is true – only that there is not enough evidence to say it’s not true.
17
Stating Conclusions of Hypothesis Tests
Remember, we must be very vigilant when we state our conclusions. The way this type of
hypothesis testing works is that we look for evidence to support the alternative claim. If we
find it, we say we have enough to support the alternative. If we don't find that evidence, we
say just that - we don't have enough evidence to support the alternative hypothesis.
• We never support the null hypothesis. In fact, the null hypothesis is probably
• We reject
• We fail to reject 18
When you perform a hypothesis test you
decide to: reject or fail to reject .
I. We could decide to not reject the null hypothesis when in reality the null hypothesis was true. This would be a
correct decision.
II. We could reject the null hypothesis when in reality the alternative hypothesis is true. This would also be a correct
decision.
III. We could reject the null hypothesis when it really is true. We call this error a Type I error.
IV. We could decide to not reject the null hypothesis, when in reality we should have because the alternative was
23
true. We call this error a Type II error.
Definitions: Types of Errors
A Type I error is the error of rejecting when is in fact true (Finding convincing evidence is true when it isn’t).
• .
• Recall seeing from last chapter. is called the level of significance of the test.
• We can select the value of with standard testing procedures we have control.
• Selecting a significance level results in a test procedure that, used over and over with different samples,
rejects a true about 5 times in 100.
24
Definitions: Types of Errors
A Type II error is the error of failing to reject when is false
• The expression, , is known as the power of the test (also called the sensitivity of test).
• It is the probability of rejecting the null hypothesis
25
Decision Table
A visualization can be helpful in remembering the types of errors.
Reality/Truth
In the jury trial example,
H0 is true H0 is false
contextually what two errors Reject Type I error Correct
could a jury make? Recall: the Decision H0
26
Definitions: Types of Errors
Thinking in medical terms:
• is called the sensitivity of the test.
• It measures how often a test correctly generates a positive result for
people who have the condition that’s being tested for (also known as the
“true positive” rate).
• is called the specificity of the test.
• It measures test’s ability to correctly generate a negative result for people
who don’t have the condition that’s being tested for (also known as the
“true negative” rate).
27
Example 1 on Example Sheet Part a
Water specimens are taken from water used for cooling as it is being discharged from a power plant into a river. It has been
determined that as long as the mean temperature of the discharged water not above here will be no negative effects on the
river’s ecosystem. To investigate whether the plant is following this regulation, the temperature is determined for 50 water
specimens at randomly selected times. The resulting data will be used to test the hypotheses of: .
• A Type I error is obtaining convincing evidence that the mean water temperature is greater than 150°F (conclude ) when in
• A Type II error is obtaining weak evidence that the mean water temperature is greater than 150°F (we cannot reject the
28
Example 1 on Example Sheet Part b
b. Which type of error would you consider more serious? Explain
If a Type II error occurs, then the ecosystem will be harmed and no action will be taken.
This could be considered more serious than a Type I error, where a company will be
required to change its practices when in fact it is not contravening the regulations.
29
Example 1 on Example Sheet Extension
As an extension of part b the previous example: consider the screening of breast cancer. If a MRI (Magnetic
Idea:
Resonance Imaging) examines a patient and detects malignant growth, the patient can receive treatment.
• The seriousness of each error is based on the situation and hypothesis test being done. In fact, the errors have
Suppose that such a screening test is used to decide between a null hypothesis of:
a relationship with each other.
In this scenario:
• A false positive or Type I error is concluding that no cancer is present when, in fact, the illness is present.
• A false negative or Type II error is concluding that cancer is present, when in fact, it is not.
• A Type I error is more serious because the patient actually has cancer but because we believe they do not, so
30
they will miss out on treatment and the cancer will only get worse.
Relationships Between Errors
Ideally, we want the probability of each error to be zero: (Type I and II respectively).
• This is impossible to achieve since we must base our decision on sample data.
Standard test procedures allow us to select , the significance level of the test, but we have no direct control over . So why not
• The reason we don’t is that have an inverse relationship: If one goes up the other goes down (and vice versa
• As we try to reduce , the possibility of making a error increases (and vice versa).
After assessing the consequences of type I and type II errors (which type of error is more severe), identify the largest that is
tolerable for the problem. Then employ a test procedure that uses this maximum acceptable value –rather than anything
c. Based on your answer to part (b), with our standard testing procedures, would we
• Increase it: probability of a type 1 will increase, but type II, the worse error, will decrease.
32
Statistical Versus Practical Significance:
What does “statistically significant” mean in terms of our conclusion?
• When the value of the test statistic leads to rejection of the null, , it is customary to say that the result is
statistically significant at the chosen significance level .
• The finding of statistical significance means that there is strong evidence for the observed deviation
(cannot reasonably be attributed to only chance variation)
Practical significance refers to the magnitude of the difference, which is known as the effect size.
• Results are practically significant when the difference is large enough to be meaningful in real life. What is
meaningful may be subjective and may depend on the context.
• Sample size has no effect. 34
Statistical Versus Practical Significance:
• Beware of studies with very large sample sizes that claim statistical significance.
35
Example
Research question: Are SAT-Math scores at one college greater than the known population mean of 500?
• Data are collected from a random sample of 1,200 students at that college. In that sample, . The population
standard deviation is known to be 100. A one-sample mean test was performed and the resulting p value
was 0.0188. Because , the null hypothesis should be rejected. These results are statistically significant.
• But, let's also consider practical significance. The difference between an SAT-Math score 500 and an SAT-
Math score of 506 is very small. With a standard deviation of 100, this difference is only standard
is known (remember this is often not the case). If this is the case, then has a
38
In the remaining section, we will apply our hypothesis
methodology to testing population means when 𝝈 is
Hypothe unknown:
sis
Testing
for With P-value: The area under the t curve with
Unknow • Appendix Table 4 (or a distribution calculator like SALT) is
n a tabulation of upper-tail t curve areas to the right of
critical values. These areas are P-values for upper-tailed
tests and, by symmetry, also for lower-tailed tests.
I. Establish and and note the level of significance. What type of test are we doing?
• The null hypothesis is that is still 12.5, even for the new drug 6-mp (there is no change).
• is that 6-mp has a different remission time than the previous drug. Therefore:
• Since we do not know the population standard deviation, , and are testing the population mean, the test
statistic we would use for this test is the t test statistic: 𝒙 −𝝁
𝒕=
𝒔
√𝒏
However, we must check the requirements of the sampling distribution of with an unknown population
standard deviation before we continue.
ii. Although n is only 21, the population distribution is known to be approximately normal
• If we use the table, since one tail areas are given and we have a two-tailed test, we must double
the area.
• Recall table 3 gives the central area, but table 4 gives the area of the tails.
42
Example 2 on Example Sheet
IV. Conclude the test
V. Interpretation: Based on the sample data, at the 1% level of significance, the evidence is not statistically significant to
reject , and thus there is unconvincing evidence that the drug 6-mP provides a different average remission time than
the previous drug used by doctors.
Additional Questions:
i. What if the level of significance was changed to 5%. Does anything change.
• The conclusion would change. We would now have convincing evidence to reject the null, since
Assumptions:
• and s are the sample mean and sample standard deviation from a random sample.
• The sample size is large, generally , or the population distribution is at least approximately normal.
• is unknown
44
Relationship Between Hypothesis Tests and
Confidence Intervals
In the previous example, we failed to reject at = 0.01. In doing so, we are saying that is a plausible
value for .
• Another way to express such information about is through a confidence interval. If we construct a
99% confidence interval for about the remission times, we find the interval to be:
• We note that this interval contains the null hypothesis value of 12.5. The confidence interval for
contains all the plausible values for . Because 12.5 is in the confidence interval, 12.5 is a plausible
value for . This is the connection between the two, which is based on the fact that
• Conversely, if you reject , then the CI will NOT contain the hypothesized value (by rejecting the
null you are saying the hypothesized value is not a realistic value, and then the CI supports this by
the hypothesized value being outside the interval.
45
In this section, we consider testing hypotheses about a population
Large proportion when the sample size n is large. Let’s first recall some
Sample things about proportions and the sampling distribution of :
Hypothe • Here we know p: the population proportion of individuals or
sis Tests objects in a specified population that possess a certain property.
for a • The Sampling Distribution has the following properties
Populati
on
Proporti • We take a random sample of size n from the population to yield
• Null hypothesis: The sample proportion comes from a population with a probability of success equal to .
Alternate hypothesis The sample proportion comes from a population with a probability of success not equal to p (or greater
than, less than).
Alternative hypothesis P-value
Area under z curve to the right of calculated z test statistic
Area under z curve to the left of calculated z test statistic
a) 2(Area to the right of z test statistic) if z is positive
b) 2(Area to the left of z test statistic) if z is negative
Assumptions:
• is the sample proportion from a random sample,
• The sample size is large. This test can be used if n satisfies both and
• If sampling is without replacement, the sample size is no more than 10% of the population size. 47
Large-Sample z-Test for p: Binomial Test
With the same sampling distribution, as before, the three properties
imply that the standardized variable:
48
Example 3 on Example Sheet
A team of eye surgeons have created a new technique for a risky eye operation to restore the sight of people blinded from
a certain disease. Under the old method, it is known that only 30% of the patients who undergo this operation recover
their eyesight. Suppose that surgeons in various hospitals are randomly selected and have performed a total of 225
operations using the new method and that 88 have been successful (i.e., the patients fully recovered their sight). Can we
justify the claim that the new method is better than the old one? Let p be the proportion of all patients who fully recover
I. Establish and and note the level of significance. What type of test are we doing?
• The null hypothesis is that p is still 0.30, even for the new method (no change).
• The alternate hypothesis is that the new method has improved (is better than) the chances of a patient
49
Example 3 on Example Sheet
II. Test statistic and Requirements: The test statistic we would use for this test of the population
^ −𝒑
𝒑
proportion p is: 𝒛=
√ 𝒑 (𝟏 − 𝒑 )
𝒏
• However, we must check the requirements of the sampling distribution of before we continue:
i. Verify: :
iii. It is reasonable to assume 225 procedures is less than 10% of the total number of procedures
performed.
50
Thus, we can use the normal distribution for the sample statistic .
Example 3 on Example Sheet
III. Calculate test statistic and find P-value:
^ −𝒑
𝒑 𝟎 . 𝟑𝟗 − 𝟎 . 𝟑
𝒛= =¿
≈ 𝟐 . 𝟗𝟓
√
𝒑 (𝟏 − 𝒑 )
𝒏
𝟎 . 𝟑(𝟎
𝟐𝟐𝟓 √
. 𝟕)
• Since we have a right-tailed test, we want to find the P-value for: 𝑷 ( 𝒛 > 𝟐 .𝟗𝟓 )
51
V. Interpretation: Based on the sample data, at the 1% level of significance, the evidence is statistically
significant and suggests that the population proportion of successful surgeries for the new surgery
technique is higher than that of the old technique. That is, the sample provides convincing evidence in
support of the claim that more than 30% of new procedure surgeries are a success.
VII.Additional Parts:
• Thus: A type I error would be having convincing evidence to conclude that the proportion of
success for the new surgery is higher than the old method, when in reality it is not.
52
VII. Additional Parts:
• Thus: A type II error would be concluding that the proportion of success for the new method is not
greater than the old method, when in reality it is an improvement over the old method and is higher
b. Very often confidence intervals are included with hypothesis tests in order to estimate and interval of
plausible values (based on sample data) for the population parameter. Suppose you calculated the 99%
CI be (0.331, 0.476). Does this CI support the conclusion we made from the hypothesis test? Why
• Yes, it does. The confidence interval (does NOT) contain the hypothesized value p, and only contains
values greater than p, so this supports our conclusion made in the hypothesis test.
• If it contained 0.3 or, just values lower, then this would be an indication that either the hypothesis
test, the CI, or both, are not conducted correctly. 53
Question
DNA testing is not 100% accurate, unfortunately. Suppose a company that manufactures at-home DNA
testing kits claims that their results are 91% accurate. An independent lab believes the reliability rate is
less than the company's claim. The null and alternative hypotheses are as follows:
a. It is concluded that the DNA test is less than 91% accurate when, in fact, it is not 91% accurate.
b. It is concluded that the DNA test is 91% accurate when, in fact, it is more than 91% accurate.
c. It is concluded that the DNA test is less than 91% accurate when, in fact, it is 91% accurate.
d. It is concluded that the DNA test is 91% accurate when, in fact, it is less than 91% accurate.
54
Question (ANSWER)
By rejecting the null hypothesis, we conclude based on our sample that the DNA test is less than 96%
accurate, but, in fact, the DNA test is 96% accurate. The probability that this will happen is the same as
a. It is concluded that the DNA test is less than 91% accurate when, in fact, it is not 91% accurate.
b. It is concluded that the DNA test is 91% accurate when, in fact, it is more than 91% accurate.
c. It is concluded that the DNA test is less than 91% accurate when, in fact, it is 91% accurate.
d. It is concluded that the DNA test is 91% accurate when, in fact, it is less than 91% accurate.
55