Hypothesis Testing Part I
Hypothesis Testing Part I
1. Preliminaries
Denition of Terms
• Hypothesis: a statement about a population parameter. The goal of a hypothesis test is to
decide, based on a sample from the population, which of two complementary hypotheses
is true.
• Null hypothesis and alternative hypothesis: the two complementary hypotheses in a hy-
pothesis testing problem.
If θ denotes a population parameter, the general format of the null and alternative hy-
potheses is H0 : θ ∈ Θ0 and H1 : θ ∈ Θc0 where Θ0 is some subset of the paramater space
and Θc0 ≡ Θ1 is its complement. Furthermore, Θ0 is called the null parameter space while
Θ1 is called the alternative parameter space.
• Hypothesis testing procedure or hypothesis test: a rule that species (i) for which sample
values the decision is made to accept H0 as true and (ii) for which sample values H0 is
rejected and H1 is accepted as true.
Note: "Accepting" a hypothesis simply means that we do not have enough evidence to
conclude otherwise and thus, we cannot reject it. Hence, it may be more appropriate to
say that we "do not reject" a hypothesis.
• Critical/rejection region: the subset of the sample space for which H0 will be rejected
• Acceptance region: the complement of the critical region
• Test statistic: a statistic whose observed value determines whether to reject or not reject
H0 . This is due to the fact that the critical region and the acceptance region are formed by
dividing the sampling distribution of the test statistic into two complementary portions.
In the next section, likelihood ratio tests are discussed as one method of nding/constructing
hypothesis tests. We focus on this rst and as we go into the methods of evaluating tests, we
discover another method of constructing a hypothesis test by using a particular lemma.
Def. The likelihood ratio test statistic for testing H0 : θ ∈ Θ0 versus H1 : θ ∈ Θc0 is
supΘ0 L(θ|x)
λ(x) =
supΘ L(θ|x)
1
A likelihood ratio test (LRT) is any test that has a rejection region of the form {x : λ(x) ≤
c}, c ∈ [0, 1].
Quick exercise: Think about the intuition behind such test statistic. What does this ratio mean?
Note: We can see a correspondence between LRTs and MLEs. We can view taking supΘ L(θ|x)
as performing an unrestricted maximization of the likelihood function while taking supΘ0 L(θ|x)
as performing a restricted maximization of the likelihood function. Performing an unrestricted
maximization of L(θ|x) yields the MLE (θ̂), provided it exists. We can also obtain the MLE for
the restricted parameter space (call it θˆ0 ). Hence we can also use,
L(θˆ0 |x)
λ(x) =
L(θ̂|x)
Examples:
r.s.
1. Let X1 , X2 , . . . , Xn ∼ N (θ, 1). Construct a likelihood ratio test for testing H0 : θ = θ0
vs. H1 : θ 6= θ0 .
r.s.
. . . , Xn ∼ Be(θ). Show that the LRT of H0 : θ ≤ θ0 vs. H1 : θ > θ0 will
2. Let X1 , X2 , P
reject H0 if ni=1 Xi > b.
• Type I Error: committed if and only if, the test rejects H0 when it is true
• Type II Error: committed if and only if, the test does not reject H0 when it is false
• Power function: The power function of a hypothesis test with critical/rejection region R
is dened as
β(θ) = P (X ∈ R)
That is, the power function yields the probability that the observed sample falls in the
critical region.
Remarks:
• An ideal test is one where P(Type I Error) = P(Type II Error) = 0. However, this is not
possible as the two error types are inversely related! One approach is to jointly minimize
the two, but this does not give an indication of which error type is actually minimized.
A better alternative would be to bound the size of the more disastrous error type, and,
subject to this, minimize the size of the other error type.
• Oftentimes, the consequences of a Type I Error are more serious than that of a Type II
Error. This guides us in how we set up our hypotheses.
• Ideally, a power function should have values close to 0 for parameter values corresponding
to H0 and close to 1 for parameter values corresponding to H1 . (Quick exercise: Try to
explain why such is the case.)
2
Example: Examine the power function for the two tests described below.
r.s.
Let X ∼ Bi(5, θ). Consider testing
using: (1) a test that rejects H0 if and only if all "successes" are observed, and (2) a test that
rejects H0 if X = 3, 4, or 5. Which test is preferable?
Def. For 0 ≤ α ≤ 1, a test with power function β(θ) is a size-α test if supθ∈Θ0 β(θ) = α.
Def. For 0 ≤ α ≤ 1, a test with power function β(θ) is a level-α test if supθ∈Θ0 β(θ) ≤ α.
Remarks:
• The size of a test gives the maximum probability of committing a Type I error.
• α is more commonly known as the level of signicance. It represents the maximum size
of a Type I error that one is willing to risk. Hence, we would be contented with using a
level-α test instead of a strictly size-α test.
• The set of level-α tests contains the set of size-α tests.
• Conventional values for α are 0.01, 0.05, and 0.1. The choice is arbitrary and dictated by
practical considerations. It must be set prior to the test itself. (Quick exercise: What is
the implication of choosing a higher value of α? What kind of values connote a "stricter"
test, in the sense that it would be more dicult to reject H0 ?)
We end the discussion here, having addressed the task of bounding the Type I error. In the
next part, we address the equally important task of minimizing the Type II error subject to our
bound for the Type I error.
Other sources:
• https://rpsychologist.com/d3/nhst/ for a visualization of error probabilities