Adv Stat II
Adv Stat II
(Hypothesis testing)
Department of Mathematics
Hong Kong University of Science and Technology
August 4, 2008
Contents
1
4.8 Exercise . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47
5 Unbiased Tests. 48
5.1 Definitions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
5.2 UMPU for One-parameter exponential family . . . . . . . . . . . . . . . . 50
5.2.1 Case I . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
5.2.2 Case II . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 51
5.2.3 A lemma used in the proof of the last theorem. . . . . . . . . . . . 54
5.2.4 Some examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
5.3 UMPU tests for multiparameter exponential families . . . . . . . . . . . . 58
5.3.1 Complete and Boundedly Complete Statistics . . . . . . . . . . . . 58
5.3.2 Similarity and Neyman Structure . . . . . . . . . . . . . . . . . . . 61
5.3.3 UMPU tests for multiparameter exponential families . . . . . . . . 62
5.3.4 UMPU tests for linear combinations of parameters in multiparam-
eter exponential families . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3.5 Power calculation. . . . . . . . . . . . . . . . . . . . . . . . . . . . 66
5.3.6 Some examples. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67
5.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
6 Unbiased tests for special families (e.g., Normal and Gamma families) 72
6.1 Ancillary Statistics and Basu’s Theorem . . . . . . . . . . . . . . . . . . . 72
6.2 UMPU tests for multi-parameter exponential families . . . . . . . . . . . . 76
6.2.1 Review. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
6.2.2 UMPU tests for special families . . . . . . . . . . . . . . . . . . . . 77
6.2.3 Some basic facts about bivariate normal distribution . . . . . . . . 80
6.2.4 Application 1: one-sample problem. . . . . . . . . . . . . . . . . . . 82
6.2.5 Application 2: two-sample problem. . . . . . . . . . . . . . . . . . . 85
6.2.6 Application 3: Testing for independence in the bivariate normal
family. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91
6.2.7 Application 4: Regression. . . . . . . . . . . . . . . . . . . . . . . . 93
6.2.8 Application 5: Non-normal example. . . . . . . . . . . . . . . . . . 93
6.3 The LSE in linear models . . . . . . . . . . . . . . . . . . . . . . . . . . . 94
6.4 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
6.4.1 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
2
7.6.2 Review: Asymptotic properties of MLE . . . . . . . . . . . . . . . . 116
7.6.3 Formulation of the problem. . . . . . . . . . . . . . . . . . . . . . . 117
7.6.4 Asymptotic χ2 approximation to LRT . . . . . . . . . . . . . . . . . 118
7.7 Wald’s and Rao’s tests and their relation with LRT . . . . . . . . . . . . . 123
7.8 LRT, Wald’s and Rao’s tests in independent but non-identically distributed
r.v. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
7.9 χ2 -tests for multinominal distribution . . . . . . . . . . . . . . . . . . . . . 125
7.9.1 Prelimanaries with the multinominal distribution . . . . . . . . . . 125
7.9.2 Tests for multinomial distribution . . . . . . . . . . . . . . . . . . . 126
7.9.3 Application: Goodness of fit tests . . . . . . . . . . . . . . . . . . . 131
7.10 Test of independence in contingency tables . . . . . . . . . . . . . . . . . . 134
7.11 Some general comments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
7.12 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 137
3
Chapter 1
Definition:
(1). A hypothesis is a statement about a population parameter.
(2). The two complementary hypotheses in a hypothesis testing problem are
called the null hypothesis H0 and the alternative hypothesis H1 , respectively.
(3) Then the general format of a hypothesis is
H0 : θ ∈ Θ 0
H1 : θ ∈ Θ1 ≡ Θc0 ,
where Θ0 and Θ1 ≡ Θc0 are called the null and alternative parameter spaces,
respectively.
Definition:
(i). Let φ(X) be a statistic taking values in [0, 1]. When X = x is observed,
we reject H0 with probability φ(x) and accept H0 with probability 1 − φ(x).
φ(x) is called the critical function, or test function or simply a test.
(ii). If φ(X) = 1 or 0 a.s., then φ(X) is a non-randomized test. Otherwise,
φ(X) is a randomized test.
(iii). The power function of the critical function φ(X) is
Z
βφ (θ) = Eθ φ(X) = φ(x)dFθ (x).
Example 1.1.1 Given a coin, we wish to test whether it is loaded or not. For instance,
we wish to see if p = P (head) < 1/2 or not. That is, Suppose we wish to test: H0 : p < 1/2
against H1 : p ≥ 1/2. To test this, we throw the coin n = 2 times, and denote the total
number of heads to be Sn . Note that the possible outcomes for Sn is 0,1,2. We might
make the following decisions:
4
Decision A: Reject H0 if Sn = 1, 2; do not reject otherwise. Then,
φA (x) = 1 when x = 1, 2
= 0 when x = 0.
φB (x) = 1 when x = 2
= 1/2 when x = 1
= 0 when x = 0 .
Clearly,
That is,
Definition. Suppose that a hypothesis test has power function β(θ). Let 0 ≤ α ≤ 1.
(1). supθ∈Θ0 β(θ) is called the true level of the test, or the size of the test.
(2). A test is said to be of size α if supθ∈Θ0 β(θ) = α.
(3). A test is said to be of level α if supθ∈Θ0 β(θ) ≤ α,
(We often refer to α as the level of the test, specified by the experimenters.
Typically, α = 0.01, 0.05, 0.10.)
5
Remark 1.2.1 For a fixed sample size, it is usually impossible to make both types of
error probabilities arbitrarily small. For instance, if we minimize the prob of type I error
by restricting α = 0, which means that we never reject H0 a.s. (or always accept H0
a.s.). This implies that prob(Type II error) = PH1 (accept H0 ) = 1. Similarly, if we let
prob(Type II error) = 0, we get prob(Type I error) = 1.
Remark 1.2.2 In searching for a good test, it is common to control the Type I error
probability at a specified level. Within this class of tests, we then search for tests that have
minimum Type II error probability (or maximum power function).
Definition: A rejection region or critical region (C.R.) is the subset of the sample space
to reject the null hypothesis. Its complement C.R.c is called the acceptance region.
Definition. The p-value for the sample point x is the smallest size for which this sample
point x will lead to rejection of H0 .
T (X) ≥ C,
6
Proof. For the sample point x, we reject H0 : θ ∈ Θ0 if and only if
T (x) ≥ C. (3.1)
Note that γ(C) is non-increasing in C. From (3.1), the largest critical value C for which
we would reject is C = T (x). So, the smallest level α for which we would reject H0
corresponds to the largest C for which we would reject H0 , i.e.,
Remark 1.3.2
(1). Since rejection of H0 using a test with small size is more convincing
evidence that H1 is true than that with a large size, the interpretation of p-
value goes in the same way. The smaller the p-value, the stronger the sample
evidence that H1 is true. If the level of the test is set at α, then we would
reject H0 if the p-value ≤ α.
(Alternatively, if p-value ≤ α, clearly the p-value is already the smallest size
to reject H0 , so we reject H0 . If p-value > α, clearly the p-value is NOT the
smallest size to reject H0 (since al is), so we can not reject H0 .)
(2). From the theorem, if the critical region has the form T (X) ≤ C which is
equivalent to −T (X) ≥ −C, then the p-value for the sample point x is
(3). The p-value is often called the observed size. Note that the p-value is a
function of the sample point x, hence a random variable itself.
(4). Notice that the definition of the p-value is only related to the null hy-
pothesis H0 , regardless of the alternative hypothesis H1 . Therefore, the actual
p-value indicates the strength of the evidence against the null hypothesis.
(5). When H0 is simple (i.e. Θ0 = {θ0 }), and the critical region is of the
form T (X) ≤ C or T (X) ≥ C, then the two methods shown in this section are
equivalent; see Homework 1.
7
1.4 Some examples
1.4.1 Normal example
H0 : µ ≤ µ0
H1 : µ > µ0 .
1. Choose a reasonable test statistic and find the critical region of the test.
4. If we want a test of level α, show that the critical value is C = Φ−1 (1 − α) and
hence show that
lim β(µ) = β(µ0 ) = α.
µ→µ0
5. Show that both type I error and type II error probabilities decrease as the sample
size n increases.
6. If we wish to have maximum Type I error probability of 0.1, and a maximum Type
II error probability of 0.2 if µ > µ0 + σ0 , find C and n to achieve these goals.
Solution:
8
4. We require that supµ≤µ0 β(µ) = β(µ0 ) = 1 − Φ(C) = α. So C = Φ−1 (1 − α). Hence
we have
lim β(µ) = β(µ0 ) = 1 − Φ(C) = α.
µ→µ0
6. Note that the power function β(µ) is an increasing function of µ. Also note that
Θ0 = {µ : µ ≤ µ0 } and Θ1 = {µ : µ > µ0 }. Therefore, the requirements in this
problem become
sup β(µ) = 1 − Φ (C) = 0.1,
µ∈Θ0
and ³ √ ´
sup [1 − β(µ)] = 1 − inf β(µ) = 1 − Φ C − n = 0.2
µ∈Θ1 µ∈Θ1
Remark 1.4.1
(1). From (2), no matter how large n is, we can always find µ sufficiently close
to µ0 (but µ > µ0 ) so that the power β(µ) is arbitrarily close to β(µ0 ) = α.
Equivalently, the prob of type II error is close to 1 − α. That is, the prob
of accepting H0 when H1 is true is close to 1 − α, which is very high. This
means that not much significance can be attached to the acceptance
of H0 .
(2). However, the low power usually happens near µ0 , where we are willing
to tolerate low power. Such a region is called an indifference region. We are
not interested in values of µ in (µ0 , µ0 + δ) for some small δ > 0, since such
improvements are negligible. So (µ0 , µ0 + δ) would be our indifference region.
Outside the indifference region, we want guaranteed power. For instance, we’d
like to have
sup β(µ) ≥ β0 ,
µ≥µ0 +δ
9
1.4.2 Binomial example
H0 : p ≤ 1/2
H1 : p > 1/2.
P5
We can choose a test statistic T = i=1 Xi , and reject H0 if T > C.
β1 (p) = Pp (T = 5) = p5 .
(3). We can plot the two power functions β1 (p) and β2 (p) against p, as shown below.
(Clearly, β2 (p) > β1 (p) for p ∈ (0, 1) and βi (0) = 0, βi (1) = 1 for i = 1, 2.)
From the plot, we see that in decision 1, the probability of type I error is very small,
in fact, for all p ≤ 1/2, we have
10
Chapter 2
2.1 Definitions
Definition: Let C be a class of all level α tests for testing H0 : θ ∈ Θ0 versus H1 : θ ∈ Θ1 .
A test φ in C is uniformly most powerful (UMP) if and only if, for any other test φ0 in C
Since it is not possible to minimize both type I and type II error probabilities for a
fixed sample size. Our objective is then to select a test φ so as to maximize the power
βφ (θ) for all θ ∈ Θ1 (i.e., minimizing Type II error probability), subject to the condition
Eθ φ(X) ≤ α for all θ ∈ Θ0 .
(i) (Existence of a UMP test.) For every α, there exists a UMP test of size
α, which is equal to
11
where γ ∈ [0, 1] and C ≥ 0 (C = +∞ is allowed) are some constants chosen
so that
Eθ0 φ (X) = α.
(ii) (Uniqueness.) If φ∗ is a UMP test of size α, i.e. Eθ0 φ∗ (X) = α, then a.s.
Proof. (i). Note that Eθ0 φ (X) = α is equivalent to Eθ0 φ (X) = Eθ0 [φ (X) I{f0 (X) > 0}] .
So we only need to consider the set where f0 (x) > 0.
First we show that there exists C and γ such that the test φ is of size α, i.e., Eθ0 φ (X) = α.
³ ´
(3). Now consider 0 < α < 1. Let α(t) = Pθ0 (f1 (X) ≤ tf0 (X)) = Pθ0 ff10 (X)
(X)
≤t .
So α(t) is a cdf, and it is non-decreasing
³ and´ right-continuous, α(−∞) = 0,
f1 (X)
α(∞) = 1 and α(t) − α(t−) = Pθ0 f0 (X) = t . Let C be such that α(C−) ≤
1 − α ≤ α(C). Set
α − [1 − α(C)]
γ = if α(C−) 6= α(C)
α(C) − α(C−)
= 0 if α(C−) = α(C).
Eθ0 φ (X) = Pθ0 (f1 (X) > Cf0 (X)) + γPθ0 (f1 (X) = Cf0 (X))
= [1 − α(C)] + γ (α(C) − α(C−))
= [1 − α(C)] + {α − [1 − α(C)]} = α.
Nest we’ll show that φ is a UMP test. Suppose that φ0 is any other test with Eθ0 φ0 (X) ≤ α.
We need to show that βφ (θ1 ) = Eθ1 φ (X) ≥ Eθ1 φ0 (X) = βφ0 (θ1 ).
We claim that the following inequality always holds:
12
Proof of (2.2). Assume that 0 < γ < 1 for simplicity. (The case for γ = 1
or 0 can be proved similarly.) Then,
(a). If φ0 (x) − φ (x) > 0, then φ (x) < φ0 (x) ≤ 1, and f1 (x) ≤
Cf0 (x). So (2.2) holds.
(b). If φ0 (x) − φ (x) < 0, then φ (x) > φ0 (x) ≥ 0, and f1 (x) ≥
Cf0 (x). Again (2.2) holds.
(c). If φ0 (x) − φ (x) = 0, clealy (2.2) holds.
Finally, we have
as has to be proved.
that is, A is the set on which φ and φ∗ differ, with the set {x :f1 (x) 6= Cf0 (x)}, (that is,
either the set {x :f1 (x) > Cf0 (x)} or the set {x :f1 (x) < Cf0 (x)}). We shall now show
that the set A has measure 0.
Note from the proof of part (i), we have
13
where the last line follows since both φ(x) and φ∗ (x) are UMP of size α. Thus, the set A
has measure 0. The theorem is proved.
Remark 2.2.1 The theorem shows that, when both H0 and H1 are simple, there exists a
UMP test that can be determined by (2.1) uniquely (up to sets of measure 0) except on
the set B = {x : f1 (x) = Cf0 (x)}.
(2). If P (X ∈ B) > 0, then the UMP tests are randomized on the set B and
the randomization is necessary for the UMP tests to have the given size α.
Remark 2.2.3 In the N-P lemma, we don’t have to consider the case where both f1 (x)
and f0 (x) are equal to 0 since in that case, we can always shrink the sample space. There-
fore, we can rewrite the UMP test as
Here, we regard 1/0 = ∞. (It happens when f0 (x) = 0 but f1 (x) > 0.
14
2.3 The relationship between the power and the size
Theorem 2.3.1 Let β be the power of the UMP test of size α, where 0 ≤ α ≤ 1, for
testing H0 : θ = θ0 versus H1 : θ = θ1 , where θ0 6= θ1 .
Proof. (i). Choose φ0 (x) ≡ α. Clearly, Eθ0 φ0 (X) = Eθ1 φ0 (X) = α. That is, this test φ0
is of size α with power α. Since β is the power of a UMP, then we have α ≤ β.
(ii). Suppose 0 < α < 1. From (i), α ≤ β. Now let us show by contradiction that we
must have α < β. If not, then α = β(< 1). Therefore, φ0 (x) ≡ α(= β) is the UMP test
and must satisfy (2.1). Then we must have f1 (x) = Cf0 (x). (Otherwise, φ0 (x) will either
be 1 or 0, which is contradictray with the assumption 0 < φ0 (x) ≡ α < 1.) Integrating
both sides, we get C = 1, hence f1 (x) = f0 (x), that is, θ0 = θ1 .
Remark. We have seen that for a UMP test, β ≥ α. This is in fact a very basic
requirement for any test. Otherwise, if α > β, we would have
15
2.4 Some examples.
2.4.1 Binomial example.
Example. Let X1 , . . . , Xn ∼ Bin(1, p). Find a UMP test of size α for testing H0 : p = p0
versus H1 : p = p1 , where 0 < p0 < p1 < 1.
Pn Pn
Xi
Solution. For x = (x1 , ..., xn ), fp (x) = p i=1 (1 − p)n− i=1
Xi
. Thus,
à !Pn Xi à !n−Pn Xi
f1 (x) p1 i=1
1 − p1 i=1
λ(x) ≡ = ,
f0 (x) p0 1 − p0
P Pn
is a strictly increasing function in ni=1 xi . Note that T = i=1 Xi ∼ Bin(n, p). Therefore,
from the N-P lemma, the UMP test of size α is
or equivalently
P
If α = nj=m+1 Cnj pj0 (1 − p0 )n−j for some integer m, then we can choose γ = 0, in which
case the UMP test φ is a nonrandomized test. Otherwise, it is a randomized test.
16
2.4.2 Normal example.
Example. Let X1 , . . . , Xn ∼ N (µ, σ02 ). Find a UMP test of size α for testing H0 : µ = µ0
versus H1 : µ = µ1 , where µ1 < µ0 .
So (Ã ! )
n
X
f1 (x)
λ(x) ≡ = exp 2[µ1 − µ0 ] xi + n[µ21 − µ20 ] /σ02
f0 (x) i=1
is a strictly decreasing function in x̄. Therefore, from the N-P lemma, a UMP test of size
α is
or equivalently
where m is given in (4.4). This is a nonrandomized test. In terms of the critical region,
we reject H0 if √
X̄ < µ0 + Φ−1 (α) σ0 / n.
Remark 2.4.1 Note in the last example that the UMP test does not depend on the alter-
native µ1 . Therefore, the test is a UMP test for testing H0 : µ = µ0 versus H1 : µ > µ0 .
See the section on monotone likelihood functions later.
17
2.4.3 Testing Normal Against Double Exponential.
(PLOT HERE.)
18
2.5 The Neyman-Pearson Lemma in terms of suffi-
cient statistics
Remark 2.5.1 In the two examples in the last section, we have seen that the UMP tests
are functions of the sufficient statistics. This is in fact true in general. To see why,
suppose that T = T (X) is a sufficient statistic, then from the Neyman Factorization
Theorem, we have
fθ (x) = g (T (x), θ) h (x) .
Assume h (x) 6= 0 for either θ = θ0 or θ = θ1 . Otherwise, we can shrink our sample space.
Therefore, from the N-P lemma, the UMP test is
which is a function of T (x). In other words, in our search for a UMP test for simple
hypothesis, we only need to concentrate on those tests which are functions of sufficient
statistics.
Remark 2.5.2 Let φ(x) be a test, based on the observations x. Suppose that T is a
sufficient statistic, then we can define a new test, based on T by
η(T ) = Eθ (φ(X)|T ) .
(It is a proper test since 0 ≤ φ(X) ≤ 1 implies 0 ≤ η(T ) ≤ 1.) Note that
In other words, the power function for the original test φ(X) is the same as that based on
sufficient statistics.
19
Chapter 3
(b) H0 : θ ≥ θ 0
H1 : θ < θ 0 .
Remark 3.2.2 Also, the above definition implies that T = T (X) is a 1-dim statistic. If
it is more than 1-dim, then it is difficult to define non-increasing function.
The following lemma states a useful result for a family with MLR.
Lemma 3.2.1 (MLR Lemma) Suppose that the family of distributions fθ (x) of X has
MLR in T (x).
20
(i). If ψ(t) is a nondecreasing function of t, then h(θ) = Eθ [ψ(T )] = Eθ [ψ[T (X)]]
is a nondecreasing function of θ.
(ii). For any θ1 < θ2 , the cumulative distribution function of T (X) under θ1
and θ2 satisfy
Pθ1 [T (X) > t] ≤ Pθ2 [T (X) > t] , for all t,
or equivalently F(T (X),θ1 ) (t) ≥ F(T (X),θ2 ) (t), for all t. (That is, if Ti ∼ FT (X),θi (t),
i = 1, 2, then T1 is stochastically smaller than T2 .)
Proof. (i). For θ1 < θ2 , define
A = {x : fθ1 (x) > fθ2 (x)}, a = sup ψ (T (x)) ,
x∈A
Thus
h(θ2 ) − h(θ1 ) = Eθ2 [ψ(T (X))] − Eθ1 [ψ(T (X))]
Z
= ψ(T (x)) [fθ2 (x) − fθ1 (x)] dx
Z Z
= ψ(T (x)) [fθ2 (x) − fθ1 (x)] dx + ψ(T (x)) [fθ2 (y) − fθ1 (y)] dy
x∈A y∈B
Z Z
≥ a [fθ2 (x) − fθ1 (x)] dx + b [fθ2 (y) − fθ1 (y)] dy
x∈A y∈B
Z
= (b − a) [fθ2 (y) − fθ1 (y)] dy (from (2.1))
y∈B
≥ 0.
(ii). In (i), simply take ψ(T ) = I{T > t0 }, which is an increasing function of T .
21
Remark 3.2.3 Some examples of exponential families include
Examples.
• Let X1 , . . . , Xn ∼ U (0, θ), where θ > 0. Show that the family has MLR in X(n) .
Proof. First the pdf of X = {X1 , . . . , Xn } is fθ (x) = θ−n I{0 < x(n) < θ}. For any
θ1 < θ2 ,
fθ2 (x) θn I{0 < x(n) < θ2 }
= 1n .
fθ1 (x) θ2 I{0 < x(n) < θ1 }
is a nondecreasing function of x(n) for values of x at which at least one of fθ1 (x) and
fθ2 (x) is positive. By definition, the family has MLR in X(n) .
• Show that the following families (which do not belong to the exponential family)
have MLR:
• Show that the Cauchy distribution family {Cauchy(θ)} does not have any MLR in
x, but it has stochastic ordering.
Proof. Left as an exercise.
• Show that the Uniform distribution family U (θ, θ + 1) does not have any MLR in x.
Proof. Left as an exercise.
Theorem 3.2.1 Let θ be a real parameter, and let the random variable X have probability
density fθ (x) with MLR in T (x). Consider the problem of testing H0 : θ ≤ θ0 against
H1 : θ > θ0 , where θ0 is a given constant.
(i). There exists a (not necessarily unique) UMP test of size α given by
22
where C and γ are determined by
β(θ0 ) ≡ Eθ0 φ(X) = α.
(ii). The power function βφ (θ) ≡ Eθ φ(X) of the test is always nondecreasing.
(iii). For all θ0 , the test φ is UMP for testing H00 : θ ≤ θ0 against H10 : θ > θ0
with size α0 = β(θ0 ).
(iv). The test φ minimizes βφ0 (θ) for any θ < θ0 amongst all tests φ0 satisfying
Eθ0 φ0 (X) = α.
Proof.
(i). First consider testing H0 : θ = θ0 against H1 : θ = θ1 with any θ1 > θ0 .
Firstly we’ll construct a test of size α. Similar to the proof of N-P lemma, we
can show
Lemma: There exists a test of size α given by
φ(x) = 1 if T (x) > C
= γ if T (x) = C
= 0 if T (x) < C, (2.2)
where C and γ are determined by
β(θ0 ) ≡ Eθ0 φ(X) = α.
Proof. Define α(t) = Pθ0 (T (X) ≤ t). Clearly, α(t) is a cdf, and it
is non-decreasing and right-continuous, α(−∞) = 0, α(∞) = 1 and
α(t) − α(t−) = Pθ0 (T (X) = t).
For any 0 < α < 1, (the cases for α = 0, 1 are easy to prove) we can
choose C such that α(C−) ≤ 1 − α ≤ α(C). Set
α − [1 − α(C)]
γ = if α(C−) 6= α(C)
α(C) − α(C−)
= 0 if α(C−) = α(C).
(Note that γ takes value 1 if α(C) = 1 − α.)
Then if α(C−) 6= α(C), we have
Eθ0 φ (X) = Pθ0 (T (X) > C) + γPθ0 (T (X) = C)
= [1 − α(C)] + {α − [1 − α(C)]} = α.
On the other hand, if α(C−) = α(C), then α(C) = 1 − α, hence
Eθ0 φ (X) = Pθ0 (T (X) > C) = 1 − α(C) = α.
f1 (x)
Secondly, we’ll show that the test φ is a UMP test. Denote λ(x) ≡ f0 (x)
=
g[T (x)]. For T (x) = C, we denote
¯
f1 (x) ¯¯
g(C) ≡ ¯ = k.
f0 (x) ¯T (x)=C
Since g[T (x)] is monotone in T (x), then we have
23
(a) if ff10 (x)
(x)
= g[T (x)] > k, then we must have T (x) > C (otherwise
g[T (x)] ≤ k in contradiction with the assumption), thus from (2.2),
we get φ(x) = 1.
f1 (x)
(b) if f0 (x)
= g[T (x)] < k, then T (x) < C, thus φ(x) = 0.
Since φ does not depend on θ1 , it follows that φ is a UMP of size α for testing
H0 : θ = θ0 versus H1 : θ > θ0 .
To show this, we note that we can write φ(X) = ψ(T (X)) = ψ(T ), where
ψ(t) is nondecreasing in t. Then applying the MLR Lemma (or part (ii) of
the present theorem), we get
(ii). Recall βφ (θ) ≡ Eθ φ(X) = Eθ ψ(T ), where φ(X) = ψ(T ) is given in part
(i) or (2.3). For any θ0 and θ1 such that θ0 < θ1 . From the MLR Lemma,
βφ (θ0 ) = Eθ0 ψ(T ) ≤ Eθ1 ψ(T ) = βφ (θ1 ).
(iii). The proof is very similar to part (i), and hence omitted.
H0 : θ ≥ θ 0 ,
H1 : θ = θ1 , θ 1 < θ0 .
24
Clearly, for any θ < θ0 ,
Eθ [1 − φ(X)] ≥ Eθ [1 − φ0 (X)].
Eθ φ(X) ≤ Eθ φ0 (X).
Remark 3.2.4 The test φ(x) in the theorem is a UMP test, but it may not be unique.
We shall give some examples a bit later in the chapter. But basically, if
f1 (x)
T (x) > C ⇐⇒ > C1
f0 (x)
then φ(x) will be a unique UMP test. On the other hand, the unique test given by
f1 (x)
> C1
f0 (x)
may yield many different UMP tests in terms of T if the correspondence is NOT one-to-
one.
Theorem 3.2.2 Let θ be a real parameter, and let the random variable X have probability
density fθ (x) with MLR in T (x). Consider the problem of testing H0 : θ ≥ θ0 against
H1 : θ < θ0 , where θ0 is a given constant.
25
For exponential family, the test is really simple as shown below.
Normal example. Let X1 , . . . , Xn ∼ N (µ, σ02 ) with known σ02 . test for testing H0 : µ ≤
µ0 versus H1 : µ > µ0 . Find a UMP test of size α.
Solution. The joint d.f. of X = {X1 , . . . , Xn } belongs to the one-parameter family (2.4)
with T (X) = X̄ and η(µ) = nµ/σ02 . From the corollary, the UMP test is
26
Solution. The joint d.f. of X = {X1 , . . . , Xn } belongs to the one-parameter family (2.4)
P p
with T (X) = Xi ∼ Bin(n, p) and η(p) = log 1−p , which is an increasing function of p.
From the corollary, the UMP test is
P
φ(x) = 1 when T (x) = xi > C
P
= γ when T (x) = xi = C
P
= 0 when T (x) = xi < C,
Uniform example. (UMP tests may not be unique.) Let X1 , . . . , Xn ∼ U nif (0, θ)
with θ > 0. Consider testing H0 : θ ≤ θ0 versus H1 : θ > θ0 .
(i) Find a UMP test of size α.
(ii) Show that the following test φ0 is also UMP of size α,
Therefore, in this case, UMP tests are not unique. (But note that we are
dealing with composite hypotheses not simple ones.)
Solution. (i). The joint d.f. of X = {X1 , . . . , Xn } is in a family with MLR in X(n) ,
which has density fX(n) (x) = nθ−n xn−1 I{0 < x < θ} and
n Z θ n−1 Cn
Pθ {X(n) > C} = n x dx = 1 − n .
θ c θ
Clearly, the UMP test is nonrandomized. From the corollary, the UMP test is
C = θ0 (1 − α)1/n ,
27
which results in the power of φ at any θ > θ0
θ0n
βφ (θ) = 1 − (1 − α). (3.5)
θn
Remark 3.3.1 UMP tests may or may not be unique for composite tests. Suppose that
the family has MLR in T (x). Then, UMP tests are unique if
fθ2 (x)
= g(T (x))
fθ1 (x)
is strictly monotone in T (x). This can be seen from the proof of the main theorem. Note
that this is the case for Normal, Binomial, Poisson examples above, but not for the case
of Uniform example.
When MLR dose not exist, similar principal applies. See the example in the next
subsection.
28
Solution. (i). Consider testing H0 : θ = θ0 v.s. H1 : θ = θ1 , where θ0 < θ1 at size α. By
N-P lemma, the unique UMP test will reject H0 if
fθ (x1 , ..., xn ) I{θ1 < x(1) < x(n) < θ1 + 1} A
λ(x) ≡ 1 = ≡ > C, where C > 0.
fθ0 (x1 , ..., xn ) I{θ0 < x(1) < x(n) < θ0 + 1} B
We shall now try to find a UMP test in terms of T = (X(1) , X(n) ), which will be easier
to use in practice. (The UMP test given later is by no means unique. Can you
find some other ones?)
Here we reject H0 for large values of λ(x), which happens when
x(n) > θ0 + 1, or x(1) > θ0 + C, where C ≥ 0.
We now explain why this is the case.
(a). If x(n) > θ0 + 1, then H0 can not be true and must be rejected.
(b). Otherwise, if x(n) < θ0 + 1(< θ1 + 1), then test depends on the value of
x(1) . As x(1) increases, the value of λ(x) also changes from 0 to 1 (under the
assumption x(n) < θ0 + 1). So x(1) can be anywhere when we decide to reject
H0 , but we can choose x(1) > θ0 + C for some C ≥ 0. Here C is chosen to
make the test to be of size α.
We make one remark before we move on. There is no need to consider the case where
both A and B are equal to zero. Therefore, the possible values of R are 0, 1, or ∞.
Coming back to the question, let us now decide C. Note that
³ ´
α = Pθ0 X(1) > θ0 + C, or X(n) > θ0 + 1
³ ´
= Pθ0 {X(1) > θ0 + C} ∪ ∅
³ ´
= Pθ0 X(1) > θ0 + C
= Pθn0 (X1 > θ0 + C)
= (1 − C)n .
Therefore, C = 1 − α1/n . [It appears that x(n) > θ0 + 1 has no effect on the size of the
test, but it does on the power.]
Now that the UMP test for testing H0 : θ = θ0 v.s. H1 : θ = θ1 , where θ1 > θ0 , it
is also UMP for testing H0 : θ = θ0 v.s. H1 : θ > θ0 . It remains to show that it is also
UMP of size α for testing H0 : θ ≤ θ0 v.s. H1 : θ > θ0 . That is, we need to check that
supθ≤θ0 Eθ φ(X) = α, as shown below.
³ ´
sup Eθ φ(X) = sup Pθ X(1) > θ0 + C, or X(n) > θ0 + 1
θ≤θ0 θ≤θ0
³ ´
= sup Pθ X(1) > θ0 + C
θ≤θ0
= sup Pθn (X1 > θ0 + C)
θ≤θ0
= sup ((θ + 1) − (θ0 + C))n
θ≤θ0
= ((θ0 + 1) − (θ0 + C))n
= (1 − C)n
= α.
(ii). From Remark 3.2.1, it suffices to consider the minimal sufficient statistics
(X1 , X(n) ). But the definition of MLR does not deal with this case.
29
3.4 Exercises
1. Show that the following family has an MLR.
e−(x−θ)
(i). Logistic family fθ (x) = 2.
[1+e−(x−θ) ]
(ii). Double exponential family fθ (x) = Ce−|x−θ| .
2. Let X be one observation from the two problems in Question 1. Find a UMP test
of size α for testing H0 : θ ≤ 0 v.s. H1 : θ > 0.
30
Chapter 4
(a) H0 : θ = θ0
H1 : θ 6= θ0 ,
(b) H0 : θ1 ≤ θ ≤ θ2
H1 : θ < θ1 or θ > θ2
(c) H0 : θ ≤ θ1 or θ ≥ θ2
H1 : θ1 < θ < θ2 ,
We will see in the next section that UMP tests do not exist for the first two cases
(a) and (b), but they do exist for the case (c) for some special circumstances, such as
one-parameter exponential family and totally positive family.
31
Plot the power functions of φi , βφi (θ) = Eθ φi (X), i = 1, 2, respectively. Note that
βφ1 (θ) and βφ2 (θ) are decreasing and increasing in θ, respectively.
From the plots of βφ1 (θ) and βφ2 (θ), we can see that, for any θ2 > θ0 , we have βφ2 (θ2 ) >
βφ1 (θ2 ). Thus, φ1 is not a UMP test of size α since φ2 has a higher power than φ1 at θ2 .
So φ1 can not possibly be a UMP test for this problem, which contradicts with our earlier
conclusion. Thus, no UMP tests of size α exists for this problem.
φ3 (x) = I{x̄ ≤ C1 or x̄ ≥ C2 }
might be a UMP test here. But in order to have size α, we need to choose
√ √
C1 = θ0 − σ0 zα/2 / n, C2 = θ0 + σ0 zα/2 / n.
By plotting its power function again those of φ1 and φ2 , it is easy to see that φ3 is not a
UMP test.
satisfies
(ii). (Sufficiency.) If a test φ satisfies (3.1) and (3.2), then it is a UMP of size
α.
(iii). (Necessity.) If φ is a UMP of size α, then for some C it satisfies (3.1).
32
4.3.2 Generalized N-P Lemma (GNP Lemma)
Lemma 4.3.1 (GNP Lemma) Let f1 (x), ..., fm+1 (x) be real-valued functions. Define
Z
C = {φ(x) : 0 ≤ φ(x) ≤ 1, φ(x)fi (x)dx = ci , i = 1, ..., m}.
Z
C 0 = {φ(x) : 0 ≤ φ(x) ≤ 1, φ(x)fi (x)dx ≤ ci , i = 1, ..., m}.
(ii). If there exist constants k1 ≥ 0,R ..., km ≥ 0 such that φ∗ (x) in (i) is a
member of C, then φ∗ (x) maximizes φ(x)fm+1 (x)dx for all φ ∈ C 0 , that is,
Z Z
∗
φ (x)fm+1 (x)dx ≥ φ(x)fm+1 (x)dx, for all φ ∈ C 0 .
(iii).
33
R R
which leads to fm+1 (x)φ∗ (x)dx ≥ fm+1 (x)φ(x)dx.
Corollary 4.3.1 Let f1 (x), ..., fm+1 (x) be pdf ’s (with respect to a probability measure
ν), and let 0 < α < 1. Then there exists a test φ of the form (3.3) such that
E1 φ(X) = ... = Em φ(X) = α, and Em+1 φ(X) > α,
Pm
unless fm+1 (x) = i=1 ki fi (x) with probability 1.
Proof. The proof will be done by induction over m.
(i). If m = 1, consider testing H0 : X ∼ f1 v.s. H1 : X ∼ f2 . This reduces to
N-P Lemma.
(ii). Assume now that the theorem holds for any set of m ≥ 2 pdf’s, and now
consider the case m + 1.
If f1 (x), ..., fm (x) are linearly dependent, then the number of fi (x)’s can be
reduced and the result follows from the induction hypothesis.
Assume now that f1 (x), ..., fm (x) are linearly independent. Then for each
1 ≤ j ≤ m, by the induction hypothesis, there exist tests φj such that
E1 φj (X) = ... = Ej−1 φj (X) = Ej+1 φj (X) = ... = Em φj (X) = α,
Ej φj (X) > α.
Let M and C be given as in the GNP lemma. Then from the above, we get
(E1 φ1 (X), α, α, ..., α)m ∈ M, where E1 φ1 (X) > α.
(α, E2 φ2 (X), α, ..., α)m ∈ M, where E2 φ2 (X) > α.
......
(α, ..., α, Em φm (X))m ∈ M, where Em φm (X) > α.
Also, by taking φ(X) ≡ 0 and 1, respectively, we see that (0, ..., 0) ∈ M and
(1, ..., 1) ∈ M .
Since M is convex and closed, we obtain that (α, ..., α)m is an inner point in
M . From the GNP Lemma, there exists k1 , ..., km such that φ∗ ∈ C and φ∗
34
R
maximizes φ(x)fm+1 (x)dx for all φ ∈ C. In particular, we can take φ0 (x) ≡ α.
Clearly, φ0 ∈ C. Then, we have
Z Z
∗ ∗
Em+1 φ (X) = φ (x)fm+1 (x)dx ≥ φ0 (x)fm+1 (x)dx = α.
P
We shall now show that the equality can not hold unless fm+1 (x) = m
i=1 ki fi (x)
∗
with probability 1. Otherwise, if Em+1 φ (X) = α = Em+1 φ0 (X), then φ0 (x) ≡
α is also the optimal test. But from the uniqueness of the optimal test (i.e.,
GNP Lemma(iii)(b)) and the assumption that 0 < α < 1, we can use the part
(i) of the last theorem to get
m
X
fm+1 (x) = ki fi (x), with probability 1.
i=1
H0 : θ ≤ θ1 or θ ≥ θ2 , where θ1 < θ2
H1 : θ 1 < θ < θ2 .
(ii). The test φ in (i) minimizes βφ0 (θ) = Eθ φ0 (X) subject to Eθ1 φ0 (X) =
Eθ1 φ0 (X) = α for every fixed θ < θ1 or > θ2 .
(iii). Assuming that η(θ) is also a continuous function of θ. For 0 < α < 1,
the power function of this test has a maximum at point θ0 between θ1 and θ2
and decreases strictly as θ tends away from θ0 in either direction, unless there
exist two values t1 , t2 such that Pθ (T (X) = t1 ) + Pθ (T (X) = t2 ) = 1 for all θ.
35
Proof. (i). One can restrict attention to the sufficient statistic T = T (X), which has
pdf of the form
gθ (t) = exp{η(θ)t − ξ(θ)}h1 (t),
whose proof is illustrated only below for the discrete case:
X
gθ (t) = Pθ (T (X) = t) = fθ (x)
T (x)=t
X
= exp{η(θ)T (x) − ξ(θ)}h(x),
T (x)=t
X
= exp{η(θ)t − ξ(θ)}h(x),
T (x)=t
X
= h(x) exp{η(θ)t − ξ(θ)},
T (x)=t
= exp{η(θ)t − ξ(θ)}h1 (t).
H0 : θ = θ1 or θ2
H1 : θ = θ3 where θ1 < θ3 < θ2
subject to Eθ1 ψ(T ) = Eθ2 ψ(T ) = α. This is equivalent to the problem of maximizing
Eθ3 φ(X) = Eθ3 ψ(T ) subject to Eθ1 ψ(T ) = Eθ2 ψ(T ) = α.
Lemma: Let M be the set of all points (Eθ1 ψ(T ), Eθ2 ψ(T )) as ψ ranges over
the totality of critical functions. Then the point (α, α) is an inner point of M .
Proof. Apply Corollary 4.3.1 by checking the linear independence condition.
and
Eθ1 φ(X) = Eθ2 φ(X) = α.
Substituting gθi (x) = exp{η(θi )t − ξ(θi )}h1 (t) for i = 1, 2, 3 into φ(x), also noting that
η(θ) is strictly increasing, then we can write
P
ψ(t) = 1 if exp{η(θ3 )t − ξ(θ3 )}h1 (t) > 2i=1 ki exp{η(θi )t − ξ(θi )}h1 (t),
P
0 if exp{η(θ3 )t − ξ(θ3 )}h1 (t) < 2i=1 ki exp{η(θi )t − ξ(θi )}h1 (t),
= 1 if a1 e−b1 t + a2 eb2 t < 1,
0 if a1 e−b1 t + a2 eb2 t > 1, where b1 , b2 > 0.
36
(a). Here a1 and a2 cannot both be ≤ 0, since then φ(x) ≡ 1 (i.e., the test
always reject H0 ), so that α = 1, which is in contradiction with the assumption
0 < α < 1.
(b). If a1 ≤ 0 and a2 > 0, then a1 e−b1 t + a2 eb2 t is a strictly increasing (de-
creasing) function. So
This becomes a one-sided test and from the one-sided UMP theorem, we know
that the power function is strictly increasing. But this cannot be true in light
of the condition (4.5).
(c). If a2 ≤ 0 and a1 > 0, we can show similarly that this case can’t be true.
(d). From the above discussion, it follows that a1 > 0 and a2 > 0. In such a
case, a1 e−b1 t + a2 eb2 t is a convex function, approaching ∞ as t → ±∞. Thus
we have
Since φ(x) does not depend on θ3 , therefore, it is UMP test for testing
H0 : θ = θ1 or θ2
H1 : θ1 < θ < θ2
To complete the proof, we need to show that the test φ is of size α for the original
problem. Take φ0 ≡ α, so Eθ φ0 (X) ≡ α. From part (ii) of the theorem (whose proof will
be given after this), we get
Also noting that Eθ1 ψ(T ) = Eθ2 ψ(T ) = α, we get supθ∈Θ0 Eθ ψ(T ) = α. This proves the
theorem.
37
We will apply the GNP Lemma. Since (α, α) is an inner point of M defined earlier,
then from the GNP lemma, there exist constants k1 , k2 such that
such that Eθ1 π 0 (X) = Eθ2 π 0 (X) = 1 − α. Substituting gθi (t) = exp{η(θi )t − ξ(θi )}h1 (t)
for i = 1, 2, 4 into π 0 (t), also noting that η(θ) is strictly increasing, then we can write
P
π 0 (t) = 1 if exp{η(θ4 )t − ξ(θ4 )} > 2i=1 ki exp{η(θi )t − ξ(θi )},
P
0 if exp{η(θ4 )t − ξ(θ4 )} < 2i=1 ki exp{η(θi )t − ξ(θi )}
= 1 if a1 eb1 t + a2 eb2 t > k1
0 if a1 eb1 t + a2 eb2 t < k1 ,
(by dividing exp{η(θ1 )t − ξ(θ1 )} on both sides)
where C1 < C2 and γi are determined by Eθ1 π 0 (X) = Eθ2 π 0 (X) = 1 − α. Thus we have
where C1 < C2 and γi are determined by Eθ1 ψ 0 (T ) = Eθ2 ψ 0 (T ) = α. This takes the same
form as the UMP test obtained in (i). Since UMP tests are unique from Theorem 4.4.2
below, we prove the theorem for the case θ4 < θ1 .
The case for θ4 > θ2 can be shown similarly. The proof is complete.
(iii). From the assumption, the density Rgθ (t) is a continuous function of θ since η(θ)
is continuous. Therefore, β(θ) ≡ Eθ ψ(T ) = ψ(t)gθ (t)dt is continuous in θ. Without loss
of generality, let η(θ) = θ.
If the claim in (iii) is not true, then there exist three points θ0 < θ00 < θ000 such that
β(θ00 ) ≤ β(θ0 ) = β(θ000 ) = c, say. Then 0 < c < 1 since β(θ0 ) = Eθ0 ψ(T ) = 0 (or 1)
implies ψ(t) = 0 or 1 a.s. w.r.t. θ0 (thus w.r.t. any θ due to the fact that the support
of the exponential family is the same for all parameter θ), but this is excluded by the
assumption Eθ1 ψ(T ) = Eθ2 ψ(T ) = α, and 0 < α < 1. As is seen by the proof of (i),
the test φ maximizes Eθ00 ψ(T ) subject to Eθ0 ψ(T ) = Eθ000 ψ(T ) = c for all θ00 such that
θ0 < θ00 < θ000 . The test φ is also unique a.s. by Theorem 4.4.2. However, unless T takes on
at most two values with probability 1 for all θ, gθ0 , gθ00 and gθ000 are linearly independent,
which implies β(θ00 ) > c from Corollary 4.3.1, contradicting with earlier result. The proof
is thus complete.
38
4.4.1 Uniqueness of UMP tests
The constants Ci s and γi ’s given in the test of the last theorem are in fact unique. To
prove this, we need to introduce a lemma.
Lemma 4.4.1 Suppose that X has a p.d.f. in {fθ (x) : θ ∈ Θ ∈ R}. Let h be a function
with a single change of sign, i.e., there exists a value x0 such that
Proof. (i). Let θ1 < θ2 . Assume that Eθ1 [h(X)] > 0. We shall show that
Eθ2 [h(X)] ≥ 0. That is, once the function h(θ) = Eθ [h(X)] becomes positive at some
point θ1 , say, then after that point, it can never become negative (but can be equal to
zero).
To prove this, let us first show that c =: fθ2 (x0 )/fθ1 (x0 ) ∈ [0, ∞).
Proof. Now that 0 ≤ fθ2 (x0 )/fθ1 (x0 ) = c < ∞, it follows that
Define S = {x : fθ1 (x) = 0, and fθ2 (x) > 0}. Then S c = {x : fθ1 (x) >
0, or fθ2 (x) = 0}. Clearly, if x ∈ S, then fθ2 (x)/fθ1 (x) = ∞ > c, which
implies that x ≥ x0 , that is, we have S ⊂ [x0 , ∞). Hence
39
Thus,
Z Z Z
Eθ2 [h(X)] = + h(x)fθ2 (x)dx ≥ h(x)fθ2 (x)dx (from (4.8))
S Sc Sc
Z
fθ2 (x)
= h(x) fθ (x)dx
Sc fθ1 (x) 1
Z Z
fθ2 (x)
= + h(x)fθ1 (x)dx
{x<x0 }∩S c {x≥x0 }∩S c fθ1 (x)
Z Z
≥ ch(x)fθ1 (x)dx + ch(x)fθ1 (x)dx (4.9)
{x<x0 }∩S c {x≥x0 }∩S c
(from (4.7) and (4.6))
Z Z
= ch(x)fθ1 (x)dx + ch(x)fθ1 (x)dx (as the second term is 0)
c
ZS S
= ch(x)fθ1 (x)dx
R
= cEθ1 [h(X)]
≥ 0 (as 0 ≤ c < ∞ and Eθ1 [h(X)] > 0 by assumption).
Finally, the result follows by letting θ0 = inf{θ : Eθ h(X) > 0}. Note that θ0 may be
infinite, if Eθ h(X) is either positive for all θ or negative for all θ; otherwise it is finite.
(ii). Let θ1 < θ2 . Assume that Eθ1 [h(X)] ≥ 0. We shall show that Eθ2 [h(X)] ≥ 0.
That is, once the function h(θ) = Eθ [h(X)] becomes zero or positive at some point θ1 ,
say, then after that point, it will remain positive.
First, under the assumed conditions, we have c := fθ2 (x)/fθ1 (x) ∈ (0, ∞) for all x ∈ Cs
(at the boundary of Cs , we could have 0/0, which will be defined to be 1).
Thus,
Z Z
fθ2 (x)
Eθ2 [h(X)] = h(x)fθ2 (x)dx = fθ (x)dx h(x)
Cs Cs fθ1 (x) 1
Z Z
fθ2 (x)
= + h(x)fθ1 (x)dx
{x<x0 }∩Cs {x>x0 }∩Cs fθ1 (x)
Z Z
> ch(x)fθ1 (x)dx + ch(x)fθ1 (x)dx (4.11)
{x<x0 }∩Cs {x≥x0 }∩Cs
[from (4.10) and (4.6)) and Pθ (h(X) 6= 0) > 1]
Z
= ch(x)fθ1 (x)dx
Cs
= cEθ1 h(X).
So we have shown that Eθ2 [h(X)] > cEθ1 [h(X)]. In particular, by choosing
θ ≡ θ2 > θ1 ≡ θ0 , we get Eθ [h(X)] > cEθ0 [h(X)] = 0 for θ > θ0 . Similarly, we
can show that Eθ [h(X)] < 0 for θ < θ0 .
40
Remark 4.4.1 The lemma means that if h(x) has a change of sign somewhere, then
Eθ [h(x)] also has a (strict) change of sign somewhere, if the family has (strict) MLR.
The next theorem shows that Ci ’s and γi ’s in the UMP tests for two-sided hypothesis
for the exponential family (in the previous theorem) are uniquely determined.
(i). If φ1 and φ2 are two tests of the form (4.4) satisfying Eθ1 φ1 (X) =
Eθ1 φ2 (X) = α. If the region {φ2 (x) 6= 0} is to the right of {φ1 (x) 6= 0}
a.s. (see the following remark for more explanations), then
(ii). If φ1 and φ2 satisfy not only Eθ1 φ1 (X) = Eθ1 φ2 (X) = α, but also
Eθ2 φ1 (X) = Eθ2 φ2 (X) = α, then φ1 (x) = φ2 (x) a.s..
Proof. (i). Define h(t) = φ2 (t) − φ1 (t), then Eθ1 h(X) = 0 from the assumption. Since h
has a single change of sign, the result follows from the last lemma (ii).
(ii). From (i), it follows that φ1 can not be either to the left or to the right of φ2 . So
φ1 has to be in the middle of φ2 with probability 1 or the other way around. Suppose φ1
is in the middle of φ2 . Then, φ1 ≤ φ2 a.s. From the assumption, we have
Then |φ2 (X) − φ1 (X)| = 0, a.s. w.r.t fθi (i = 1, 2). So, φ1 (X) = φ2 (X) a.s. w.r.t. fθ .
Remark 4.4.2 Suppose that φ1 and φ2 are two tests of the form (4.4).
(a) We say the region {φ2 (x) 6= 0} is to the right of {φ1 (x) 6= 0} if
(i) C1φ2 ≥ C1φ1 and C2φ2 ≥ C2φ1 , and at least one of inequalities hold
strictly.
(ii) C1φ2 = C1φ1 and C2φ2 = C2φ1 , but γ1φ2 ≥ γ1φ1 and γ2φ2 ≥ γ2φ1 , and
at least one of inequalities hold strictly.
In (i) and (ii), if both inequalities hold strictly, we say that {φ2 (x) 6= 0} is
strictly to the right of {φ1 (x) 6= 0}.
(b) It can be shown easily that if {φ2 (x) 6= 0} is strictly to the right of
{φ1 (x) 6= 0}, then φ2 (x) − φ1 (x) has a strict change of signs (from negative to
positive).
However, if {φ2 (x) 6= 0} is to the right (but not strictly) of {φ1 (x) 6= 0},
then φ2 (x) − φ1 (x) is always non-negative or non-positive. In such cases, it
can be shown that Eθ [h(X)] = 0 has no solution.
(c) We can define the region {φ2 (x) 6= 0} to be (strictly) inside {φ1 (x) 6= 0}
and derive similar results to those in (a) and (b).
41
Remark 4.4.3 This theorem shows that Ci ’s and γi ’s are uniquely determined. It also
indicates how to determine them in practice. One can start with some initial trial values
C1∗ , γ1∗ , find C2∗ , γ2∗ such that β ∗ (θ1 ) = α, and compute β ∗ (θ2 ), which will usually be either
too large or too small. If β ∗ (θ2 ) < α, the correct acceptance region is to the right of the
chosen one, that is, it satisfies either C1 > C1∗ or C1 = C1∗ and γ1 < γ1∗ . The converse is
true if β ∗ (θ2 ) > α.
42
4.5 UMP tests for totally positive families.
The theorems given in the last section apply to the one-parameter families. However, they
hold for some other cases too. One such example is the case for totally positive families.
(See Lehmann, 1986, p119.)
Definition. A family of distributions with p.d.f.’s fθ (x), where θ and x are real-valued,
is said to be totally positive of order r (TPr ) if for all x1 < ... < xn and θ1 < ... < θn ,
¯ ¯
¯ fθ (x1 ), ..., fθ (xn ) ¯
¯ 1 1 ¯
¯ f (x ), ..., f (x ) ¯
¯ ¯
∆n = ¯¯ θ2 1 θ2 n
¯ ≥ 0, for all n = 1, 2, ..., r.
¯ ... ... ... ¯
¯
¯ f (x ), ..., f (x ) ¯
θn 1 θn n
(ii). For r = 2, |∆n | ≥ 0 (or > 0) means fθ (x) has (strict) MLR in x.
Proof. |∆n | = fθ1 (x1 )fθ2 (x2 ) − fθ2 (x1 )fθ1 (x2 ) ≥ 0 implies
fθ2 (x2 ) fθ (x1 )
≥ 2 .
fθ1 (x2 ) fθ1 (x1 )
(iii). Suppose a(θ) > 0, b(x) > 0. Then if fθ (x) is STPr , so is a(θ)b(x)fθ (x).
43
Remark 4.5.2 It follows from the lemma that the equation g(x) = 0 has at most two
solutions.
H0 : θ ≤ θ1 or θ ≥ θ2 , where θ1 < θ2
H1 : θ 1 < θ < θ2 .
(ii). This test φ minimizes βφ0 (θ) = Eθ φ0 (X) subject to Eθ1 φ0 (X) = Eθ2 φ0 (X) =
α for every fixed θ < θ1 and > θ2 .
(iii). For 0 < α < 1, the power function of this test has a maximum at
point θ0 between θ1 and θ2 and decreases strictly as θ tends away from θ0
in either direction, unless there exist two values t1 , t2 such that Pθ (T (X) =
t1 ) + Pθ (T (X) = t2 ) = 1 for all θ.
Proof. The proof of the theorem is very similar to that of Theorem 4.4.1. For illustration,
here we only give a proof of part (i) below.
One can restrict attention to the sufficient statistic T = T (X). First let us find a
UMP test of size α for testing
H0 : θ = θ1 or θ2
H1 : θ = θ3 where θ1 < θ3 < θ2
subject to Eθ1 ψ(T ) = Eθ2 ψ(T ) = α. This is equivalent to the problem of maximizing
Eθ3 φ(X) = Eθ3 ψ(T ) subject to Eθ1 ψ(T ) = Eθ2 ψ(T ) = α. Let M be the set of all points
(Eθ1 ψ(T ), Eθ2 ψ(T )) as ψ ranges over the totality of critical functions. Then the point
(α, α) is an inner point of M . From the GNP lemma, there exist constants a1 , a2 such
that
and
Eθ1 φ(X) = Eθ2 φ(X) = α.
44
Then we have
where
" #
fθ (t) fθ (t)
g(t) = a1 fθ1 (t) − fθ3 (t) + a2 fθ2 (t) = fθ3 (t) a1 1 + a2 2 −1
fθ3 (t) fθ3 (t)
g(t) fθ (t) fθ (t)
g1 (t) = = a1 1 + a2 2 − 1, (note that fθ3 (t) > 0.)
fθ3 (t) fθ3 (t) fθ3 (t)
We have the following situations:
(a). Assume a1 ≤ 0.
(a1). If a2 ≤ 0, then g(t) < 0 or g1 (t) < 0, hence φ(x) ≡ 1 (i.e., the
test always reject H0 ), so that α = 1, which is in contradiction with
the assumption 0 < α < 1.
(a2). If a2 > 0, then g1 (t) is a strictly increasing function. So φ(x)
becomes a one-sided test and the power function can be shown to be
strictly increasing. But this cannot be true in light of the condition
(4.5).
(b). From (a), we know that a1 > 0. Then applying Lemma 4.5.1, we have
Since φ(x) does not depend on θ3 , therefore, it is UMP test for testing
H0 : θ = θ1 or θ2
H1 : θ1 < θ < θ2
Also noting that Eθ1 ψ(T ) = Eθ2 ψ(T ) = α, we get supθ∈Θ0 Eθ ψ(T ) = α. This proves the
theorem.
45
4.6 Some examples
4.6.1 Normal example.
Let X1 , . . . , Xn ∼ N (µ, 1). Then a UMP test of size α for testing H0 : µ ≤ µ1 or µ ≥ µ2
v.s. H1 : µ1 < µ < µ2 is
φ(x) = I{x̄ : C1 < x̄ < C2 },
where C1 and C2 are determined from
³ ´
Eµi φ(X) = Pµi C1 < X̄ < C2
√ √
= Φ( n(C2 − µi )) − Φ( n(C1 − µi )) = α i = 1, 2.
Example. Let f and g be two known p.d.f.’s. Suppose that X has the p.d.f. θf (x) +
(1 − θ)g(x), 0 ≤ θ ≤ 1. Show that the test φ(x) ≡ α is a UMP test of size α for testing
H0 : θ ≤ θ1 or θ ≥ θ2 v.s. H1 : θ1 < θ < θ2 .
So either Eθ1 φ(X) = α or Eθ2 φ(X) = α. For simplicity, we assume that Eθ1 φ(X) = α,
then we must have Eθ2 φ(X) = α. Otherwise, if not, we would have
Eθ1 φ(X) = α, Eθ2 φ(X) < α.
However, since the power function is a continuous function of θ, we can always find a
θ ∈ [θ1 , θ2 ] such that Eθ φ(X) < α, which implies that φ can not be UMP (since its power
is not greaterR than the power ofR the simple test φ0 ≡ α). Thus we have have (6.12).
Treating φ(x)f (x)dx and φ(x)g(x)dx as two unknown variables, and noting that
the determinant of the two equations is
¯ ¯ ¯ ¯
¯ θ (1 − θ1 ) ¯ ¯ θ 1 ¯
¯ 1 ¯ ¯ 1 ¯
¯ ¯=¯ ¯ = θ1 − θ2 6= 0,
¯ θ2 (1 − θ2 ) ¯ ¯ θ2 1 ¯
46
then we have Z Z
φ(x)f (x)dx = φ(x)g(x)dx = α.
From this, it follows that
Z Z
Eθ3 φ(X) = θ3 φ(x)f (x)dx + (1 − θ3 ) φ(x)g(x)dx = α.
Note also that Eθ3 α = α. By the uniqueness of a UMP test from the GNP lemma, we get
φ(X) = α, a.s.
Since θ3 is arbitrary, we see that φ(X) = α is a UMP test of size for testing for
H0 : θ ≤ θ1 or θ ≥ θ2
H1 : θ1 < θ < θ2 .
4.7 Summary
UMP tests do not usually exist for two-sided tests. But for one parameter exponential
family or strictly totally positive family, UMP tests do exist for one type of two-sided
tests. Generalized Neyman-Pearson Lemma plays a critical role in deriving the UMP
tests in these cases.
4.8 Exercise
1. If X1 , . . . , Xn is from the exponential family
where η(θ) is strictly monotone in θ. Show that UMP tests do not exist for testing
H0 : θ = θ0 v.s. H1 : θ 6= θ0 .
gθ (t) = a(θ)b(t)eη(θ)t
47
Chapter 5
Unbiased Tests.
5.1 Definitions.
In this chapter, we are interested in finding “optimal” tests for two-sided hypotheses of
the following types:
(a) H0 : θ = θ0
H1 : θ 6= θ0 ,
(b) H0 : θ1 ≤ θ ≤ θ2
H1 : θ < θ1 or θ > θ2
We’ve seen in the last chapter that UMP tests may NOT exist in these two cases. It is
therefore necessary to narrow down our search in a smaller class of tests. One such class
is that of unbiased tests.
(i). βφ (θ) ≤ α, if θ ∈ Θ0 ,
(ii). βφ (θ) ≥ α, if θ ∈ Θ1 . (1.1)
Remark 5.1.1 .
(i). Note that first condition is simply supθ∈Θ0 βφ (θ) ≤ α, i.e., φ is of level α.
(ii). If UMP tests exist, they must be unbiased.
(iii). The second requirement simply means that the test φ is no worse than
the silly test φ0 (x) ≡ α.
(iv). Another interpretation of (ii) is that the probability of rejecting H0 when
H1 is true should be at least as large as that of rejecting H0 when H0 is true,
i.e.,
P (Reject H0 |H1 is true) ≥ P (Reject H0 |H0 is true).
48
Definition. An unbiased test φ of level α for testing H0 : θ ∈ Θ0 v.s. H1 : θ ∈ Θ1 is said
to be uniformly most powerful unbiased (UMPU) if for any other unbiased test φ0 of level
α, we have
where ω =: ∂Θ0 ∩ ∂Θ1 is the common boundary of Θ0 and Θ1 (i.e., the set of points θ
that are points or limit points of both Θ0 and Θ1 ).
Definition. Tests satisfying (1.2) are said to be similar on the boundary (SOB).
Proof of Lemma 5.1.1. For any point θ ∈ ω, there exists sequences θ01 , θ02 , θ03 , ..., ∈ Θ0
and θ11 , θ12 , θ13 ..., ∈ Θ1 , such that θ0n → θ and θ1n → θ as n → ∞. Since βφ (θ) is
continuous and the test φ is unbiased, we have
α ≥ n→∞
lim βφ (θ0n ) = βφ (θ) = n→∞
lim βφ (θ1n ) ≥ α
It is easier to work with (1.2) than (1.1), and the next lemma is useful in determining
a UMPU test.
Lemma 5.1.2 (Similarity Lemma) If X has p.d.f. {fθ (x) : θ ∈ Θ} such that the power
function of every test is continuous, and if φ0 is UMP SOB (i.e., among all tests satisfying
(1.2)) and is a level α test, then φ0 is UMPU.
Proof. All power functions are continuous by assumption. From Lemma 5.1.1, any
unbiased test must be similar on the boundary. Since φ0 is UMP of level α among
all tests satisfying (1.2) by assumption, it is also uniformly as least as powerful as any
unbiased test. Finally, we note that φ0 is also unbiased since it is at least as powerful as
the trivial test φ(x) ≡ α (since the later is unbiased and also similar on the boundary).
So φ0 is UMPU.
Lemma 5.1.3 If X has p.d.f. {fθ (x) : θ ∈ Θ} such that the power function of every test
is continuous, and if φ0 is UMP SOB (i.e., among all tests satisfying (1.2)) and is a level
α test, then φ0 is UMPU.
Proof. All power functions are continuous by assumption. From Lemma 5.1.1, any
unbiased test must be similar on the boundary. Since φ0 is UMP of level α among
all tests satisfying (1.2) by assumption, it is also uniformly as least as powerful as any
unbiased test. Finally, we note that φ0 is also unbiased since it is at least as powerful as
the trivial test φ(x) ≡ α (since the later is unbiased and also similar on the boundary).
So φ0 is UMPU.
49
5.2 UMPU for One-parameter exponential family
5.2.1 Case I
Theorem 5.2.1 Let θ be a real-valued parameter, and X = {X1 , . . . , Xn } is a random
sample from a one-parameter exponential family
H0 : θ1 ≤ θ ≤ θ2 ,
H1 : θ < θ1 or > θ2 , (where θ1 < θ2 ).
(ii). The power function of this test, Eθ φ(X), has a minimum at point θ0
between θ1 and θ2 and increases strictly as θ tends away from θ0 in either
direction.
Proof. (i). It is easy to show that the power function Eθ φ(X) is continuous in θ, so one
can apply Lemma 5.1.2 (i.e., Similarity Lemma). Clearly the boundary set is ω = {θ1 , θ2 },
and by the lemma, we consider first the problem of maximizing Eθ0 φ(X) for θ0 outside
the interval [θ1 , θ2 ], subject to (2.3). This is equivalent to minimizing Eθ0 [1 − φ(X)] for θ0
outside the interval [θ1 , θ2 ], subject to
From Theorem 4.4.1, we see that 1 − φ(x) does minimize Eθ0 [1 − φ(X)] for θ0 6∈ [θ1 , θ2 ],
subject to (2.4). Therefore, the test φ(x) is UMP amongst those satisfying (2.3), and
hence is UMPU by the above lemma.
(ii). Continued from part (i), also from Theorem 4.4.1, Eθ [1 − φ(X)] has a maximum
at point θ0 between θ1 and θ2 and decreases strictly as θ tends away from θ0 in either
direction. This is, Eθ φ(X) has a minimum at point θ0 between θ1 and θ2 and increases
strictly as θ tends away from θ0 in either direction.
50
5.2.2 Case II
Theorem 5.2.2 Let θ be a real-valued parameter, and X1 , . . . , Xn is a random sample
from a one-parameter exponential family
H0 : θ = θ0 ,
H1 : θ 6= θ0 .
(ii). The power function of this test, Eθ φ(X), has a minimum at point θ0
between θ1 and θ2 and increases strictly as θ tends away from θ0 in either
direction.
Proof. (i). One can restrict attention to the sufficient statistic T = T (X), which has pdf
of the form
gθ (t) = exp{θt − ξ(θ)}h1 (t) = D(θ)eθt h1 (t).
βψ0 (θ0 ) = 0.
51
However,
Z Z
βψ0 (θ) = θt
ψ(t)D(θ)te h1 (t) dt + ψ(t)D0 (θ)eθt h1 (t) dt (2.9)
D0 (θ)
= Eθ [T ψ(T )] + Eθ [ψ(T )]. (2.10)
D(θ)
Note that the above identity is valid for any test ψ. In particular, by taking
ψ(T ) ≡ ψ0 (T ) = α, we have β(θ) ≡ α. Then the above identity reduces
D0 (θ) D0 (θ)
0 = Eθ [T α] + α, i.e., 0 = Eθ T + .
D(θ) D(θ)
From this, (2.6) and (2.8), we get 0 = β 0 (θ0 ) = Eθ0 [T ψ(T )] − αEθ0 [T ].
Lemma 5.2.2 Let M be the set of points (Eθ0 [ψ(T )], Eθ0 [T ψ(T )]) as ψ ranges
over the totality of critical functions. We’ll now show that the point (α, αEθ0 (T ))
is an inner point of M .
We now return to the main proof of the theorem. Consider finding a UMP test for simple
hypotheses
H0 : θ = θ0 ,
H1 : θ = θ 0 , where θ0 6= θ0
subject to (2.6) and (2.7). Since the point (α, αEθ0 (T )) is an inner point of M , then by
the GNP lemma, there exists constants k1 and k2 and a test ψ(t) such that
52
Note that the left hand side of the inequality of ebt > a1 + a2 t is exponential and the right
hand side is linear. Thus the region is either one-sided or the outside of an interval. The
former is impossible since the power function has a strictly monotone power function and
therefore the test φ can not be possibly unbiased. Therefore, we get
where C1 < C2 and γi are determined by (2.6) and (2.7). The proof of part (i) follows
then from the lemma in the last section.
(ii). The power function β(θ) = Eθ [ψ(T )] is a continuous function and has a minimum
at θ = θ0 (see the proof in Lemma 5.2.1). Thus there exists θ1 < θ2 such that θ0 ∈ [θ1 , θ2 ]
and β(θ1 ) = β(θ2 ) = c where α ≤ c < 1. Therefore, the test ψ is also UMPU test of size c
for testing H0 : θ1 ≤ θ ≤ θ2 , and it follows from the last theorem that the power increases
strictly as θ moves away from θ0 in either direction.
Remark 5.2.1 .
Eθ0 [ψ(T )] = α,
Eθ0 [T ψ(T )] = Eθ0 T Eθ0 [ψ(T )] = αEθ0 T, (i.e., Covθ0 (T, ψ(T )) = 0.)
which is equivalent to
Eθ0 [1 − ψ(T )] = 1 − α,
Eθ0 [T (1 − ψ(T ))] = Eθ0 T Eθ0 [1 − ψ(T )] = (1−α)Eθ0 T, (i.e., Covθ0 (T, 1−ψ(T )) = 0.)
which is often simpler to use.
(2). The above two conditions can sometimes be simplified. If for θ = θ0 , the
distributionof T is symmetric about some point a, that is,
Then any test which is symmetric about a and satisfies (2.6) must also satisfy
(2.7). To see this, first note that we can write ψ(t) = ψ0 (t − a) and F (x) =
F0 (x − a) where ψ0 and F0 are both symmetrical around 0, and hence
Z Z
Eθ0 [(T − a)ψ(T )] = (t − a)ψ0 (t − a)dF0 (t − a) = tψ0 (t)dF0 (t) = 0,
53
Although it appears that there is only one condition, however, the other con-
dition is symmetry of the test, which is hidden.
(3). The results presented here can be generalized to the STPr family.
and η = Q(θ) is strictly monotone, similar results in this section are also true
since we can test η first and transform back to θ. One such example is the
Poisson example.
H0 : θ ≤ θ0 , versus H1 : θ > θ0 .
It is known from the earlier chapter that for any 0 < α < 1, there exists a UMP test φ of
size α of the form
where C and γ are determined by Eθ0 φ(X) = α and the power function βφ (θ) = Eθ φ(X)
is strictly increasing.
Lemma 5.2.3 (Lehmann, 1986, p117, Question 22.) If Q(θ) is differentiable, then βφ0 (θ) >
0 for all θ for which Q0 (θ) > 0.
Proof. First Z
βψ (θ) = Eθ [ψ(T )] = ψ(t)D(θ)eQ(θ)t h1 (t) dt.
For exponential family, it can be shown that β(θ) is differentiable, and the order of
integration and differentiation can be interchanged, so that for all tests ψ(t), we have
Z Z
βψ0 (θ) = 0
ψ(t)D(θ)Q (θ)te Q(θ)t
h1 (t) dt + ψ(t)D0 (θ)eQ(θ)t h1 (t) dt
D0 (θ)
= Q0 (θ)Eθ [T ψ(T )] + Eθ [ψ(T )].
D(θ)
For ψ(T ) ≡ α, we have βα (θ) ≡ α, then the above becomes
D0 (θ)
0 = Q0 (θ)Eθ T + .
D(θ)
54
Substituting this into the expression for β 0 (θ) gives
For any point θ0 , consider the problem of maximizing the derivative β 0 (θ0 ) subject
to βψ (θ0 ) = Eθ0 ψ(T ) = α, where 0 < α < 1. In light of (2.12), this is equivalent
to maximizing Eθ0 [T ψ(T )] subject to Eθ0 φ(X)R = α, when Q0 (θ0 ) > 0. However, since
0 < α < 1, α is certainly an inner point of M = ( φ(x)fθ0 (x)dx, for all 0 ≤ φ ≤ 1) = [0, 1].
Then by the GNP lemma, there exists a constant C and an optimal test ψ such that
where C and γ are determined from Eθ0 ψ(T ) = α. However, this is exactly the same test
as the one in (2.11). We already knew that its power function is strictly increasing, which
implies that β 0 (θ0 ) ≥ 0. Again from (2.12), this means
We now show that the equality can not hold here. Since if Eθ0 [T ψ(T )] = αEθ0 [T ], then
we see that ψ0 (t) ≡ α is also the optimal test. From the GNP lemma, the optimal test is
unique almost surely. Then we have T (x) = C, however this is impossible. So we get
55
5.2.4 Some examples
Example 1. (Binomial.) Let X ∼ Bin(n, p). Find a UMPU test of size α for testing
H0 : p = p 0 , versus H1 : p 6= p0 ,
Solution. Write
( Ã ! )
p
f (x) = Cnx px (1 − p)n−x
= Cnx exp x ln + n ln(1 − p)
1−p
³ ´
p
So it is in an exponential family with T (X) = X, and θ = Q(p) = ln 1−p
, which is
strictly increasing since Q0 (p) = p1 + 1−p
1
> 0. So testing
H0 : p = p0 , versus H1 : p 6= p0
is equivalent to testing
H0 : θ = θ 0 , versus H1 : θ 6= θ0 .
or equivalently
CX
à ! à !
2 −1 X2
n x n Ci
p0 (1 − p0 )n−x + (1 − γi ) p0 (1 − p0 )n−Ci = 1 − α,
x=C1 +1 x i=1 Ci
CX
à ! à !
2 −1 X2
n x n Ci
x p0 (1 − p0 )n−x + (1 − γi )Ci p0 (1 − p0 )n−Ci = np0 (1 − α).
x=C1 +1 x i=1 Ci
Solution. The distribution belongs to the exponential family with sufficient statistic
P
T (X) = Xi2 . Under H0 , T /σ02 has p.d.f.:
1
χ2n (t) = t(n/2)−1 e−t/2 , t > 0.
2n/2 Γ(n/2)
56
Also note that
1
tχ2n (t) = t((n+2)/2)−1 e−t/2
2n/2 Γ(n/2)
" #
(n+2)/2
1 2 Γ((n + 2)/2)
= (n+2)/2 t((n+2)/2)−1 e−t/2
2 Γ((n + 2)/2) 2n/2 Γ(n/2)
= nχ2n+2 (t), (as Γ(a + 1) = aΓ(a).) (2.13)
(1). A UMPU test of size α for testing H0 : σ 2 = σ02 v.s. H1 : σ 2 6= σ02 is
ψ(t) = I{T (x) < D1 , or > D2 }
= 1 − I{D1 ≤ T (x) ≤ D2 }
= 1 − I{C1 ≤ T (x)/σ02 ≤ C2 },
where C’s are determined from
Eσ0 [1 − ψ(T )] = 1 − α,
" # " #
T T
Eσ0 2
(1 − ψ(T )) = Eσ0 2 Eσ0 [(1 − ψ(T ))] = n(1 − α),
σ0 σ0
or equivalently
Z C2
χ2n (t)dt = 1 − α,
C1
Z C2 Z C2
tχ2n (t)dt = nχ2n+2 (t)dt = n(1 − α),
C1 C1
or equivalently
Z C2 Z C2
χ2n (t)dt = χ2n+2 (t)dt = (1 − α).
C1 C1
If n is large such that (n + 2)/n ∼ 1, then C1 ≈ χ2n (α/2) and C2 ≈ χ2n (1 − α/2),
respectively. This is roughly the “equal-tailed” chi-square test in elementary statistics.
R C2
As a final remark, one could also integrate C1 tχ2n (t)dt by parts to reduce it to
n/2 n/2
C1 e−C1 /2 = C2 e−C2 /2 .
(2). Similarly to (1), a UMP test of size α for testing H0 : σ 2 ≤ σ02 v.s. H1 : σ 2 > σ02 is
ψ(t) = I{T (x)/σ02 > C0 },
where C’s are determined from
Z C0
Eσ0 [1 − ψ(T )] = gn (t)dt = (1 − α).
0
In other words, the critical value is C0 = χ2n (1 − α). This is exactly the chi-square test
used in elementary statistics.
Example 3. (Poisson.) Let X ∼ P oisson(λ). Find a UMPU test of size α for testing
H0 : λ = λ0 , versus H1 : λ 6= α0 ,
Solution. Similar to Example 1.
57
5.3 UMPU tests for multiparameter exponential fam-
ilies
In many important problems, the hypothesis concerns a single-valued parameter, but the
distribution depends on certain other nuisance parameters. Similarity Lemma (S.O.B.)
is still the main tool in finding UMPU tests for multi-parameter exponential families,
however, it is NOT so easy to use here in multi-parameter exponential family as in one-
parameter exponential families. We shall instead use another much easier “Neyman
structure” (N.S.) condition.
The relationship between N.S and S.O.B. is as follows.
Hence,
“UMP N.S.” ⇐⇒ “UMP S.O.B.” =⇒ “UMPU”.
(b). The family is called boundedly complete if for all bounded function g(t),
Eθ g(T ) = 0 for all θ =⇒ Pθ (g(T ) = 0) = 1 for all θ.
Remark 5.3.1 If the range of (η1 (θ), · · · , ηk (θ)) contains an open k-dim rectangle, then
for any fixed θ0 , it can be shown that (η1 (θ) − η1 (θ0 ), · · · , ηk (θ) − ηk (θ0 )) is linearly inde-
pendent. Then the family is of full rank k.
58
P
Example. Assume X1 , . . . , Xn ∼ Bin(1, p). Then T = Xi ∼ Bin(n, p) for 0 < p < 1.
Show that T is complete.
Proof. The conclusion is obvious by applying the last theorem.
Alternatively, we can prove it directly. If Eθ g(T ) = 0 for 0 < p < 1, then
à ! à !
X n X n
pt q n−t g(t) = q n rt g(t) = 0.
t t
Differentiate w.r.t. θ, one gets g(θ)θn−1 = 0, i.e., g(θ) = 0 for all θ > 0. Therefore, T is
complete.
Example. Show that the minimal sufficient statistic for U (θ, θ + 1) is not complete.
³ ´
Solution. It has been shown that the minimal sufficient statistic is T = X(1) , X(n) .
Since F (x) = (x − θ)I(θ < x < θ + 1) + I(x ≥ θ + 1), we get P (X(n) ≤ x) = F (x)n .
Therefore,
Z Z θ+1
E(X(n) ) = xdF n (x) = nx(x − θ)n−1 dx
θ
Z 1
n−1 n
= n (y + θ)y dy = +θ
0 n+1
1
= (θ + 1) − .
n+1
1
Similarly, we can get E(X(1) ) = θ + n+1
. Therefore,
· ¸
n−1
E X(n) − X(1) − = 0.
n+1
But X(n) − X(1) − (n − 1)(n + 1) 6= 0. So T is not complete.
Remark 5.3.2 This example shows that minimal sufficiency does not imply complete-
ness.
59
Pn−1
Example. Show that T = (Xn , i=1 Xi ) for Bin(1, p) is not complete.
h Pn−1 i Pn−1
Solution. E Xn − i=1 Xi /(n − 1) = 0. But Xn − i=1 Xi /(n − 1) 6= 0. So T is not
complete.
Example. Let X1 , . . . , Xn be iid n(θ, θ), θ > 0. Show that the statistic T = (X̄, S 2 )
is a sufficient statistic for θ, but the family of distributions is not complete. (This is an
example where the family of distributions is exponential, but it is not of full rank.)
h i
Solution. E X̄ − S 2 = θ − θ = 0. But X̄ − S 2 6= 0. So T is not complete.
That is,
∞
X X∞
−θ
g(n)θn−1 = g(0) = − ng(0)θn ,
n=1 (1 − θ)2 n=1
or
g(1) + g(2)θ + g(3)θ2 + ...... = −g(0)θ − 2g(0)θ2 − 3g(0)θ3 − ......,
thus
g(1) = 0, g(n) = −(n − 1)g(0), n = 2, 3, 4, ....
If g is required to be bounded, then g(0) = 0, thus,
60
5.3.2 Similarity and Neyman Structure
We have seen earlier that similarity on the boundary is useful in deriving a UMPU test
(see the Similarity Lemma) when there is only one parameter of interest and no other
nuisance parameters. When there are nuisance parameters, there is a simpler and easier
condition to use (Neyman structure) when the family has a boundedly complete and
sufficient statistic.
Suppose that the observation X ∼ Fθ . We shall now concentrate on the family
FX = {Fθ : θ ∈ ω}, where ω is a parameter set, and in particular, will be chosen
to be the boundary of Θ0 and Θ1 . Let T be sufficient statistic for θ with distribution
function FθT . Let FT = {FθT : θ ∈ ω}.
Definition. A test φ is said to have Neyman structure with respect to a sufficient statistic
T if
E [φ(X)|T ] = α, a.s. FT
(i.e. it holds except on a set N with P (N ) = 0 for all P ∈ FT )
Remark 5.3.3 .
The reverse is also true provided that T is also boundedly complete. See the
lemma below.
(iii). It is often easier to obtain a UMP test having Neyman structure (as
it can be done on each surface). Then the resulting test is UMPU if every
similar test has Neyman structure.
Proof. Suppose that FX = {Fθ : θ ∈ ω} is boundedly complete and let φ(X) be similar
w.r.t. ω, that is, Eθ φ(X) = α for all θ ∈ ω, or equivalently
E (φ(X)|T ) − α = 0, a.s. FT .
61
Conversely, suppose that FX = {Fθ : θ ∈ ω} is NOT boundedly complete. Then there
exists a bounded function g, say, |g(t)| ≤ M for some M , such that Eθ g(T ) = 0, ∀θ ∈ ω,
but Pθ0 (g(T ) = 0) < 1 for some θ0 ∈ Θ. Now define
Then we have
In search of UMPU tests for θ, attention can be restricted to the sufficient statistics
(U, T ) which has joint p.d.f.
( k
)
X
gθ,ν (u, t) = C(θ, ν) exp θu + νi ti h1 (u, t), (θ, ν) ∈ Ω.
i=1
62
Also the conditional p.d.f. of U given T = t is
gθ,ν (u, t)
qθ (u|t) = R
gθ,ν (u, t)du
nP o
k
C(θ, ν) exp {θu} exp i=1 νi ti h1 (u, t)
= R nP
k
o
C(θ, ν) exp {θu} exp i=1 νi ti h1 (u, t)du
exp {θu} h1 (u, t)
= R
exp {θu} h1 (u, t)du
= Ct (θ) exp {θu} ht (u), θ ∈ Θ. (3.15)
Eθ1 {ψ2 (U, T )|T = t} = Eθ2 {ψ2 (U, T )|T = t} = α, for all t.
Eθ1 {ψ3 (U, T )|T = t} = Eθ2 {ψ3 (U, T )|T = t} = α, for all t.
63
(4). For testing H0 : θ = θ0 v.s. H1 : θ 6= θ0 ,
The next theorem states that these tests are in fact UMPU tests unconditionally.
Theorem 5.3.3 Suppose that X is from the exponential family (3.14) of full rank. Then
the tests ψi (u, t), i = 1, 2, 3, 4 are UMPU tests of size α for the respective hypotheses.
Proof. We only need to concentrate on test functions based on the sufficient statistics
(U, T ). It is easy to show that the power function of any test ψ, Eθ,ν ψ(U, T ), is continuous
in (θ, ν), so one can apply Lemma 5.1.2 (i.e., Similarity Lemma). Note that the power
function of a test ψ against an alternative (θ, ν) ∈ Ω1 is
Z Z
βψ (θ, ν) ≡ Eθ,ν [ψ(U, T )] = ψ(u, t)gθ,ν (u, t) dudt
Z ·Z ¸
= ψ(u, t)qθ (u|t)du rθ,ν (t)dt (3.16)
Z
=: βψ (θ|t)rθ,ν (t)dt,
where βψ (θ|t) is the power of ψ conditional on T = t, and rθ,ν (t) is the marginal p.d.f. of
T given by
Z Z ( k )
X
rθ,ν (t) = gθ,ν (u, t)du = C(θ, ν) exp {θu} exp νi ti h1 (u, t)du
i=1
( k )
X
= C(θ, ν) exp νi ti h3 (t).
i=1
Let Ω be the whole parameter space of (θ, ν), which is necessarily convex. Clearly the
boundary sets for i = 1, 2, 3, 4 are
ω1 = ω4 = {(θ, ν) ∈ Ω : θ = θ0 },
ω2 = ω3 = {(θ, ν) ∈ Ω : θ = θ1 , θ2 }.
In order to show that ψ is UMPU, we only need to show that it is UMP SOB.
(1). Let us consider the case i = 1, 2, 3 first. For simplicity, we shall only
consider the case i = 1, other cases follow similarly. By Similarity Lemma, we
consider the problem of maximizing βψ (θ, ν) = Eθ,ν ψ(U, T ) subject to
64
or equivalently
Eθ0 ,ν (Eθ0 [ψ(U, T )|T ] − α) = 0.
For fixed θ0 , T is complete and sufficient for ν. In view of this and (3.16), the
problem boils down to maximizing the conditional power on T = t:
Z
βψ (u|t) = ψ(u, t)qθ (u|t)du for every t
subject to
(Note the last equation does not depend on ν since T is sufficient for ν.)
Since U follows an one-parameter exponential family conditional on T = t, so
ψ1 (u, t) is such a solution, and hence a UMPU test.
(2). Let us now consider the case i = 4. First, unbiasedness of a test ψ implies
Similarly to the one-parameter exponential family, one can show that (3.19)
is equivalent to
Eθ0 ,ν [U ψ(U, T ) − αU ] = 0
Then we can rewrite (3.18) and (3.19) as
For fixed θ0 , the statistic T is complete and sufficient for ν. Thus, we have
(a.s.)
65
5.3.4 UMPU tests for linear combinations of parameters in mul-
tiparameter exponential families
Through a transformation of parameters, the theorem in the last section can be used to
find UMPU tests for parameters of the form
k
X
∗
θ = a0 θ + ai νi , a0 6= 0.
i=1
where
U ai
U∗ = , Ti∗ = Ti − U.
a0 a0
Proof.
(Ã k
! k
)
X U (x) X
fθ,ν (x) = C(θ, ν) exp θ∗ − ai νi + νi Ti (x) h(x),
i=1 a0 i=1
( k · ¸)
X ai νi
∗ ∗
= C(θ, ν) exp θ U (x) + νi Ti (x) − U (x) h(x),
i=1 a0
( k
)
X
∗ ∗
= C(θ, ν) exp θ U (x) + νi Ti∗ (x) h(x).
i=1
Remark 5.3.4 .
(i). βψ (θ0 |t) can be interpreted as the conditional probability of rejecting H0
once we observe T = t. It is an unbiased estimator of the true known power
βψ (θ0 , ν) since
Eθ0 ,ν [βψ (θ0 |T )] = βψ (θ0 , ν).
(ii). One disadvantage of the conditional power is that it is available only
after the observation is taken. So it can not be used to plan the experiment in
advance, particularly to determine the sample size.
66
5.3.6 Some examples.
e−(λ1 +λ2 )
f (x1 , x2 ) = exp {x1 log(λ1 ) + x2 log(λ2 )}
x1 !x2 !
e−(λ1 +λ2 )
= exp {x2 log(λ2 /λ1 ) + (x1 + x2 ) log(λ1 )} .
x1 !x2 !
Here U = X2 ∼ P (λ2 ), T = X1 + X2 ∼ P (λ1 + λ2 ), and θ = log(λ2 /λ1 ). So we can find
UMPU tests for θ = log(λ2 /λ1 ) or η = λ2 /λ1 . Therefore,
H0 : λ2 = a0 λ1 , versus H1 : λ2 6= a0 λ1
is equivalent to
H0 : θ = ln a0 = θ0 , versus H1 : θ 6= ln a0 = θ0 .
Now we can apply the theorem in this section. But we need the following result:
à !
λ2
{U |T = t} = {X2 |(X1 + X2 ) = t} ∼ Bin t, p = .
λ1 + λ2
a0
Under H0 , we have p = 1+a0
≡ p0 .
or
C2 (t)−1 Ã ! 2
à !
X t k X t C (t)
p0 (1 − p0 )t−k + [1 − γi (t)] p0 i (1 − p0 )t−Ci (t) = 1 − α,
k=C1 (t)+1
k i=1 Ci (t)
C2 (t)−1 Ã ! 2
à !
X t k X t C (t)
t−k
k p0 (1 − p0 ) + [1 − γi (t)]Ci (t) p0 i (1 − p0 )t−Ci (t) = (1 − α)tp0 .
k=C1 (t)+1
k i=1 Ci (t)
67
Example. (Testing for odds ratio.) Let X1 ∼ Bin(m, p1 ), X2 ∼ Bin(n, p2 ), and X1
and X2 are independent. Let qi = 1 − pi , i = 1, 2. We wish to test H0 : p2 /q2 = a0 p1 /q1
versus H1 : p2 /q2 6= a0 p1 /q1 for some known a0 . Find a UMPU test of size α.
Solution. Write qi = 1 − pi , i = 1, 2. The joint p.d.f. of (X1 , X2 ) is
à ! à !
m x1 m−x1 n x2 n−x2
f (x1 , x2 ) = p q p q
x1 1 1 x2 2 2
à !à ! à !x à !x
m n m n p1 1 p2 2
= q q
x1 x2 1 2 q 1 q2
à !à ! ( à ! )
m n m n p2 p1 p1
= q q exp x2 log − log + (x1 + x2 ) log .
x1 x2 1 2 q2 q1 q1
Here U = X2 ∼ Bin(n, p2 ), T = X1 + X2 , and θ = log pp12 qq21 . So we can find UMPU tests
for θ. Therefore, the original testing problem
H0 : θ = θ0 = ln a0 , versus H1 : θ 6= θ0 .
Now we can apply the theorem in this section. But we need to find the conditional
distribution
Pθ (X2 = u, X1 = t − u)
Pθ (U = u|T = t) = Pθ (X2 = u|(X1 + X2 ) = t) =
Pθ (X1 + X2 = t)
Pθ (X2 = u) Pθ (X1 = t − u)
= Pt
k=0 Pθ (X2 = k) Pθ (X1 = t − k)
³ ´ ³ ´
m n u n−u
t−u
pt−u
1 q1
m−t+u
p q
u 2 2
= Pt ³ m ´ t−k m−t+k ³n´ k n−k
k=0 t−k p1 q1 k
p2 q2
³ ´ ³ ´
m n
t−u
p−u u
1 q1 u
pu2 q2−u
= Pt ³
m
´ ³ ´
n
k=0 t−k
p−k k
1 q1 k
pk2 q2−k
³ ´³ ´
m n
t−u u
euθ
= Pt ³
m
´³ ´
n
k=0 t−k k
ekθ
68
Example. (Testing for independence in a 2×2 contingency table). Let A an B be
two different events in a probability space related to a random experiment. Suppose that
n independent trials of the experiment are carried out and that we observe the frequencies
of the occurrance of the events A ∩ B, A ∩ B c , Ac ∩ B and Ac ∩ B c . The results can be
summarized in the following 2 × 2 contingency table.
A Ac Total
B X11 X12 n1
Bc X21 X22 n2
Total m1 m2 n
We wish to test
Solution. Note X11 + X12 + X21 + X22 = n and p11 + p12 + p21 + p22 = 1. The p.d.f of
X = (X11 , X12 , X21 , X22 ) is multinomial with probabilities p = (p11 , p12 , p21 , p22 ), where
p = EX/n. That is,
n!
f (x) = px11 px12 px21 px22
x11 !x12 !x21 !x22 ! 11 12 21 22
( Ã ! Ã ! Ã !)
n! n p11 p12 p21
= p exp x11 log + x12 log + x21 log ,
x11 !x12 !x21 !x22 ! 22 p22 p22 p22
which is in an exponential family. So we can derive UMPU tests for any parameter of the
form
à ! à ! à !
p11 p12 p21
θ = a0 log + a1 log + a2 log , (3.20)
p22 p22 p22
where ai ’s are known constants.
Now A and B are independent (which implies that A and B c are independent, and Ac
and B are independent, and Ac and B c are independent) iff P (A ∩ B) = P (A)P (B) iff
69
So A and B are independent iff θ = 0.
Now rewrite
( Ã ! Ã !)
n! p12 p21
f (x) = pn22 exp x11 θ + (x11 + x12 ) log + (x11 + x21 ) log ,
x11 !x12 !x21 !x22 ! p22 p22
So we have U = X11 ∼ Bin(n, p11 ), T = (X11 + X12 , X11 + X21 ).
P (U = u|T = t)
= P (X11 = u|X11 + X12 = t1 , X11 + X21 = t2 )
P (X11 = u, X12 = t1 − u, X21 = t2 − u)
= Pt1 ∧t2
k=0 P (X11 = k, X12 = t1 − k, X21 = t2 − k)
n!
pu pt1 −u pt212 −u p22
u!(t1 −u)!(t2 −u)!(n+u−t1 −t2 )! 11 12
n+u−t1 −t2
= Pt1 ∧t2 n! k t1 −k t2 −k n+k−t1 −t2
k=0 k!(t1 −k)!(t2 −k)!(n+k−t1 −t2 )! p11 p12 p21 p22
n!
pu p−u p−u pu
u!(t1 −u)!(t2 −u)!(n+u−t1 −t2 )! 11 12 21 22
= Pt1 ∧t2 n! k −k −k k
k=0 k!(t1 −k)!(t2 −k)!(n+k−t1 −t2 )! p11 p12 p21 p22
t1 ! (n−t1 )!
u!(t1 −u)! (t2 −u)!((n−t1 )−(t2 −u))!
euθ
= Pt1 ∧t2 t1 ! (n−t1 )! kθ
k=0 k!(t1 −k)! (t2 −k)!((n−t1 )−(t2 −k))! e
³ ´³ ´
t1 n−t1 uθ
u t −u
e
= Pt1 ∧t2 ³ 2´³ ´ , (i.e. the noncentral hypergeometric distribution.)
t1 n−t1 kθ
k=0 k t2 −k
e
Note that under H0 : A and B are independence, ⇐⇒ θ = 0 = θ0 , and therefore,
³ ´³ ´ ³ ´³ ´
t1 n−t1 t1 n−t1
u t −u u t −u
Pθ0 (U = u|T = t) = Pt1 ∧t2 ³ 2´³ ´ = ³ 2´ . (3.21)
t1 n−t1 n
k=0 k t2 −k t2
Remark 5.3.6 Note that the above UMPU test is conditional on T = (X11 + X12 , X11 +
X21 ), i.e., the marginal sums are fixed. Then the only free r.v. is U = X11 . In particular,
when η = 1, then (3.21) is the hypergeometric distribution, which is exactly the same as
that in the last example for testing odd ratios in the binomial example. Also, the UMPU
test when η = 1 is also also called the Fisher’s exact test. In practice, an equal-
tailed Fisher’s two-sided test is often employed to simplify the calculations, so it is only
approximately optimal test.
Remark 5.3.7 Another approximate test is the χ2 test. However, it is not as powerful.
70
5.4 Summary
We derived UMPU tests for one- and multi-parameter exponential families. The critical
notion is the similarity on the boundary (S.O.B.) and Neyman structure (N.S.), respec-
tively.
5.5 Exercises
1. Let X1 , ..., X10 be iid Bin(1, p).
(i). Find a UMP test of size α = 0.1 for testing H0 : p ≤ 0.2 or p ≥ 0.7
v.s. H1 : 0.2 < p < 0.7.
(ii). Find the power of the UMP test in (i) when p = 0.4.
(iii). Find a UMPU test of size α = 0.1 for testing H0 : p = 0.2 v.s.
H1 : p 6= 0.2
(iv). Find the power of the UMPU test in (iii) when p = 0.4.
2. Let X1 , ..., Xn be iid from some distribution function Fθ (x). Find a UMPU test for
testing H0 : θ = θ0 v.s. H1 : θ 6= θ0 if
71
Chapter 6
UMPU tests for (multi-parameter) exponential families have been found in the last chap-
ter. For some special cases, such as normal families, the UMPU tests given in the last
chapter (usually in conditional format) may sometimes be simplified to the familiar tests
we studied in elementary statistics courses. So the results here justify some of the tests
used in elementary textbooks.
Remark 6.1.1 .
72
Since T is boundedly complete and sufficient, we get
(ii). If Y1 , ..., Yn are independent with pdf f1 (y), ..., fn (y) respectively, and
ancillary. Show that T = g(Y1 , ..., Yn ) is also ancillary.
Proof. For (i) and (ii), we have, respectively
Z
P (T ≤ t) = f (y1 , ..., yn )dy1 · · · dyn
g(y1 ,...,yn )≤t
and Z
P (T ≤ t) = f1 (y1 ) · · · fn (yn )dy1 · · · dyn .
g(y1 ,...,yn )≤t
In either case, they do not depend on any unknown parameters, and hence are ancillary.
The proof is complete.
Remark 6.1.2 If Y1 , ..., Yn are dependent, and is ancillary individually, then T = g(Y1 , ..., Yn )
may not be ancillary. In other words, marginal ancillariness does not imply joint ancil-
lariness. For instance, one such example is the bivariate normal (X, Y ) with standard
normal marginals and correlation coefficient ρ 6= 0.
Clearly, the distribution of V = S 2 does not depend on µ (since Yi does not) and hence
V = S 2 is ancillary. Also it is known that T = X̄ is complete and sufficient. Hence the
result follows from the Basu’s Theorem.
73
(2). First we have
à !2 n µ ¶ Ã√ !2
(n − 1)S 2 X n
Xi − X̄ X Xi − µ 2 n(X̄ − µ)
2
= = − .
σ0 i=1 σ0 i=1 σ0 σ0
or Ã√ !2
n µ ¶
X Xi − µ 2 (n − 1)S 2 n(X̄ − µ)
W ≡ = + ≡ W1 + W2 .
i=1 σ0 σ02 σ0
It is known that
(a). W ∼ χ2n with m.g.f. EetW = (1 − 2t)−n/2 ,
(b). W2 ∼ χ21 with m.g.f. EetW2 = (1 − 2t)−1/2 ,
Also, EetW = EetW1 +tW2 = EetW1 EetW2 as W1 and W2 are independent. Therefore, the
m.g.f. of W1 is
EetW1 = EetW /EetW2 = (1 − 2t)−(n−1)/2 .
Thus, W1 ≡ (n − 1)S 2 /σ02 ∼ χ2n−1 .
√
(3). From n(X̄ − µ)/σ0 ∼ N (0, 1) and (n − 1)S 2 /σ02 ∼ χ2n−1 and their independence,
we have
√ √ √
n(X̄ − µ) n(X̄ − µ)/σ0 n(X̄ − µ)/σ0 N (0, 1)
= =q =q ∼ tn−1 .
S S/σ0 [(n − 1)S 2 /σ02 ]/(n − 1) χ2n−1 /(n − 1)
74
Example. Let X1 , . . . , Xn ∼iid N³(µx , σx2 ), and ´Y1 , . . . , Yn ∼iid N (µy , σy2 ), and X’s and Y ’s
are independent. Show that T = X̄, Sx2 , Ȳ , Sy2 is independent of the sample correlation
coefficient defined by P
(Xi − X̄)(Yi − Ȳ )
V = qP P .
(Xi − X̄)2 (Yi − Ȳ )2
³ ´
Proof. First it can be shown that T = X̄, Sx2 , Ȳ , Sy2 is sufficient and complete for
θ = (µx , σx2 , µy , σy2 ).
Next define Ui = (Xi − µx )/σx and Wi = (Yi − µy )/σy , so Ui ∼ N (0, 1) and Wi ∼
N (0, 1). And we can rewrite
P
(Ui − Ū )(Wi − W̄ )
V = qP P ,
(Ui − Ū )2 (Wi − W̄ )2
which is clearly ancillary from the lemma following Basu’s theorem. By Basu’s Theorem,
V and T are independent.
Example. Let U1 /σ12 ∼ χ2f1 , and U2 /σ22 ∼ χ2f2 and they are independent. Suppose that
σ22 /σ12 = a. Show that U2 /U1 and aU1 + U2 are independent. In particular, if σ1 = σ2 ,
U2 /U1 and U1 + U2 are independent.
U2 U2 /σ22
V ≡ =a ∼ aFf1 ,f1 (F distribution)
U1 U1 /σ12
does not depend on σ12 or σ22 , hence is ancillary. By Basu’s Theorem, V and T are
independent.
75
6.2 UMPU tests for multi-parameter exponential fam-
ilies
6.2.1 Review.
Let X be distributed according to the (k + 1) exponential family of full rank
( k
)
X
fθ,ν (x) = C(θ, ν) exp θU (x) + νi Ti (x) h0 (x), (θ, ν) ∈ Ω. (2.1)
i=1
Eθ1 {ψ2 (U, T )|T = t} = Eθ2 {ψ2 (U, T )|T = t} = α, for all t.
76
where the functions Ci (t)’s and γi (t)’s are determined from
Eθ1 {ψ3 (U, T )|T = t} = Eθ2 {ψ3 (U, T )|T = t} = α, for all t.
Theorem 6.2.1 Suppose that X is from the exponential family (2.1) of full rank, and
that V = h(U, T ) is independent of T when θ = θi , i = 0, 1, 2.
(a) If h(u, t) is increasing in u for each t, then UMPU tests of size α for the
first three hypothesis testing problems given in the last subsection are equivalent
to those with (U, T ) replaced by V and with Ci (t) and γi (t) replaced by Ci and
γi , where i = 1, 2, 3.
(b) If there are functions a(t) > 0 and b(t) such that
V = h(U, T ) = a(T )U + b(T ), i.e., V is linear in U for fixed T ,
then the UMPU test of size α for the fourth hypothesis testing problems given
in the last subsection is equivalent to that with (U, T ) replaced by V and with
Ci (t) and γi (t) replaced by Ci and γi , where i = 4.
For simplicity, we list these UMPU tests in more details.
(1). A UMPU test for testing H0 : θ ≤ θ0 v.s. H1 : θ > θ0 is
ψ1 (v) = 1 when v > C0
γ0 when v = C0
0 when v < C0
77
where Ci ’s and γi ’s are determined from
Eθ0 {ψ4 (V )} = α,
Eθ0 {V ψ4 (V )} = αEθ0 [V ] .
Proof. (1). First consider testing H0 : θ ≤ θ0 v.s. H1 : θ > θ0 . Since h(u, t) is increasing
in u for each t, then a UMPU test of size α
is equivalent to
Eθ0 {ψ1 (U, T )|T = t} = Eθ0 {ψ1 (V, t)|T = t} = Eθ0 {ψ1 (V, t)} = α,
that is,
Pθ0 (V > D0 (t)) + γ0 (t)Pθ0 (V = D0 (t)) = α.
78
Clearly, D0 (t) and γ0 (t) do not depend on t, as can be seen from the proof of the Neyman-
Pearson Lemma. We denote D0 (t) = C0 and γ0 (t) = γ0 . Therefore, from (2.3), ψ1 (v, t)
does not depend on t either. That is, a UMPU test of size α is
i.e.,
i.e.,
or
2
X
Pθ0 (D1 (t) < V < D2 (t)) + [1 − γi (t)]Pθ0 (V = Di (t)) = 1 − α,
1
Z D2 (t)
vdFV (v) = (1 − α)Eθ0 (V ) ,
D1 (t)
from which we can see that Di (t)’s and γi (t)’s do not depend on t.
79
6.2.3 Some basic facts about bivariate normal distribution
We shall introduce some basic facts concerning normal and bivariate normal distributions,
which will be useful in some examples to follow.
Proof. The density of the Z’s are obtained from that of the X’s by substituting xi =
Pn
j=1 bij zj , where (bij ) is the inverse of the matrix (aij ), and multiplying by the Jacobian,
which is 1.
√
Proof. If we make an orthogonal transformation such that Z1 = nX̄, then, by AAT = I,
n
X n
X
Zi2 = Zi2 − Z12 == (Z1 , ..., Zn )(Z1 , ..., Zn )T − Z12
i=2 i=1
n
X n
X
= (X1 , ..., Xn )AAT (X1 , ..., Xn )T − Z12 = Xi2 − n(X̄)2 = (Xi − X̄)2 .
i=1 i=1
Theorem 6.2.3 Let (Xi , Yi ) ∼ Bivariate Normal with parameters (µ1 , µ2 , σ12 , σ22 , ρ).
(1). µ ¶
ρσ2
Yi |Xi = x ∼ N µ2 + (x − µ1 ), σ22 (1 − ρ2 )
σ1
P
(2). If (xi − x̄)2 > 0 (i.e., non-degenerate), and ρ = 0, then conditional on
X1 = x1 , ..., Xn = xn , we have
√ ¯
n − 2ρ̂ ¯¯
√ ¯ X = x ∼ tn−2
1 − ρ̂2 ¯
and because X and Y are independent under ρ = 0, we have
√
n − 2ρ̂
√ ∼ tn−2 .
1 − ρ̂2
80
Proof. The first part is in any elementary statistics book. We shall only prove the second
part below. First,
P
(Xi − X̄)(Yi − Ȳ )
ρ̂ = V = ³P ´1/2 ³P ´1/2
(Xi − X̄)2 (Yi − Ȳi )2
P
ai Yi
= qP where ai = √PXi −X̄
(Xi −X̄)2
Yi2 − n(Ȳ )2
and
qP P
q Yi2 − n(Ȳi )2 − ( ai Yi )2
1− ρ̂2 = qP
Yi2 − n(Ȳ )2
Thus, √ P P
n − 2V a i Yi a i Yi
W = √ 2
= P
r P ≡ q Q .
1−V 2 2
Yi −n(Ȳ ) −( ai Yi ) 2
n−2
n−2
Note that X X
ai = 0, a2i = 1,
from which we see that (a1 , ..., an ) and (n−1/2 , ..., n−1/2 ) are orthnormal.
From the expression of W , we can certainly assume that Yi ∼ N (0, 1), otherwise we
can always renormalize. If √ we make an orthogonal transformation from (Z1 , ..., Zn ) to
P
(Y1 , . . . , Yn ) such that Z1 = nȲ and Z2 = ni=1 ai Yi , then
n
X n
X n
X X
Zi2 = Zi2 − Z12 − Z22 = Yi2 − n(Ȳ )2 − ( ai Yi )2 ≡ Q.
i=3 i=1 i=1
Z2 N (0, 1)
W = r Pn = r 2 ∼ tn−2
Z2
i=3 i
χn−2
n−2 n−2
81
6.2.4 Application 1: one-sample problem.
(1). Take
−1 X nµ
θ= 2
, U = Xi2 , ν = 2, T = X̄.
2σ σ
So the hypotheses H0 : σ 2 ≤ σ02 versus H0 : σ 2 > σ02 is equivalent to H0 : θ ≤ θ0 versus
−1 2 2
H0 : θ > θ0 for θ0 = 2σ 2 since θ = −1/(2σ ) is a strictly increasing function of σ .
0
When θ = θ0 (i.e., σ 2 = σ02 ), T = X̄ is complete and sufficient for µ. Take
P
(n − 1)S 2 Xi2 − n(X̄)2 U − nT 2
V = = = = h(U, T ) ∼ χ2n−1 .
σ02 σ02 σ02
So V ∼ χ2n−1 and hence is ancillary and thus independent of T . Also, V (U, T ) is increasing
in U for each T . Hence, a UMPU test of size α is
ψ(v) = I{v ≥ C0 },
where C0 satisfies
where we have used the fact vχ2n−1 (v) = (n − 1)χ2n+1 (v), which follows from (2.13) in the
last chapter. So C1 , C2 satisfy
Z C2 Z C2
χ2n−1 (v)dv = χ2n+1 (v)dv = (1 − α).
C1 C1
82
If n − 1 ≈ n + 1, then C1 , C2 are nearly the (α/2)th and (1 − α/2)th quantiles of χ2n−1 ,
respectively (since they are roughly the solutions which are unique). This is, the UMPU
test here is nearly the same as the “equal-tailed” chi-square test in elementary textbooks.
ψ(v) = I{v ≥ C0 },
where C0 satisfies
(4). As in (3), assume that µ0 = 0. The transformation in (3) can not be used to test
H0 : θ ≤ 0 versus H0 : θ > 0 since V is not linear in U . Let us try a new transformation
√ √ √ √
nX̄ nU nX̄/σ nZ̄
W = W (U, T ) = qP = √ = qP = qP , where Zi = Xiσ−0
Xi2 T Xi2 /σ 2 Zi2
Since (Z1 , ..., Zn ) ∼i.i.d. N (0, 1), so the d.f. of W does not depend on σ 2 , hence W is
P
ancillary and thus is independent of T = Xi2 , which is complete and sufficient for σ 2 .
Furthermore, we note that W is linear in U and has a symmetrical distribution under
H0 . The symmetry of distribution of W follows from
√
nȲ
−W = qP , where Yi = −Xi ∼ N (0, σ 2 )
2
Yi
where C0 satisfies
α
Eθ0 ψ(W ) = Pθ0 {|W | ≥ C0 } = . (2.5)
2
83
(The second constraint follows from the first one if W has a symmetric distribution.)
Now from the following identity
q
(n − 1)W
V = √ ,
1 − W2
we see that V is strictly √
increasing in W and V also has a symmetrical distribution
(n−1)(−W )
under H0 (since −V = √ 2
has the same distribution as W which also has a
1−(−W )
symmetrical distribution). Noting that V is an odd function of W , it follows from (2.4)
and (2.5) that a UMPU test of size α is
ψ(v) = I{|v| ≥ D0 },
where D0 satisfies
In other words, we reject H0 iff |V | > tn−1 (1−α/2). This is the test used in the elementary
statistics textbook.
84
6.2.5 Application 2: two-sample problem.
Example. Let X1 , ..., Xm ∼ N (µ1 , σ12 ) and Y1 , . . . , Yn ∼ N (µ2 , σ22 ) with unknown µi ’s
and σi2 ’s. Find UMPU tests for testing
So we can take
à ! à !
1 1 1 mµ1 nµ2
θ=− 2
− 2 , ν = − 2, 2 , 2 ,
2σ2 2σ1 ∆0 2σ1 σ1 σ2
Ãm Pn !
n
X X j=1 Yj2
U= Yj2 , T = Xi2 + , X̄, Ȳ .
j=1 i=1 ∆0
So H0 : σ22 /σ12 ≤ ∆0 v.s. H1 : σ22 /σ12 > ∆0 is equivalent to H0 : θ ≤ 0 v.s. H0 : θ > 0.
where W = [S22 /σ22 ]/[S12 /σ12 ] ∼ Fn−1,m−1 . Clearly, V is ancillary and hence independent
of T . Also, V (U, T ) is increasing in U for each T and is linear in U . Also note that V is
a strictly increasing function of W . Hence, a UMPU test of size α is
ψ(v) = I{v ≥ C0 },
ψ(w) = I{w ≥ D0 },
where D0 satisfies Eθ0 ψ(W ) = Pθ0 {Fn−1,m−1 ≥ D0 } = α. This is the F -test in elementary
statistics textbooks.
85
(2). We provide two methods here.
ψ(v) = I{v ≤ C1 , or ≥ C2 },
where C1 , C2 satisfy
Z C2
Eθ0 [1 − ψ(V )] = B 1 (n−1), 1 (m−1) (v)dv = 1 − α
C1 2 2
Z C2 Z C2
Eθ0 {V [1 − ψ(V )]} = vB 1 (n−1), 1 (m−1) (v)dv = B 1 (n+1), 1 (m−1) (v)dvEθ0 V
C1 2 2 C1 2 2
= (1 − α)Eθ0 V.
It can be shown then that this UMPU test can then be approximated by the F -test
which rejects H0 : θ = 0 iff F < F(n−1),(m−1) (α/2) or F > F(n−1),(m−1) (1 − α/2). See
Method 2 below.
Lemma 6.2.1 If X ∼ χ2m and Y ∼ χ2n and they are independent. Show that
X
U= ∼ Beta(m/2, n/2), V = X + Y ∼ χ2m+n
X +Y
and U and V are independent.
86
X
Proof. Define U = X+Y and V = X +Y . Then its inverse is X = U V and Y = V (1−U ),
with Jacobian ∂(X, Y )/∂(U, V ) = V . So
Thus,
fU (u) = Cum/2−1 (1 − u)n/2−1 , fV (v) = Cv (m+n)/2−1 e−v/2 .
Therefore, U and V are independent and U ∼ Beta(m/2, n/2) and V ∼ χ2m+n .
F
V = ,
1+F
where F = W2 /W1 ∼ F(n−1),(m−1) . It can be shown (see the lemma after this example)
that
F(n−1),(m−1) (v) =????????, v ≤ 0.
?????
???????????
87
(3). Given σ12 = σ22 = σ 2 , we can write the joint p.d.f.
1 X m 1 X n
mµ1 nµ2
f (x, y) = C(µ1 , µ2 , σ12 , σ22 ) exp − 2 x2i − 2 yj2 + 2 x̄ + 2 ȳ
2σ1 i=1 2σ2 j=1 σ1 σ2
1 Xm Xn
1
= C(µ1 , µ2 , σ12 , σ22 ) exp − 2 x2i + yj2 + 2 (mµ1 x̄ + nµ2 ȳ)
2σ i=1 j=1 σ
But
µ ¶
1 1
+ (mµ1 x̄ + nµ2 ȳ)
m n
m n
= µ1 x̄ + µ2 ȳ + µ1 x̄ + µ2 ȳ
n m
m n
= (µ2 − µ1 )(ȳ − x̄) + µ1 ȳ + µ2 x̄ + µ1 x̄ + µ2 ȳ
n m
1 1
= (µ2 − µ1 )(ȳ − x̄) + µ1 (mx̄ + nȳ) + µ2 (nȳ + mx̄)
n m
1
= (µ2 − µ1 )(ȳ − x̄) + (mµ1 + nµ2 ) (mx̄ + nȳ)
mn
So
(µ − µ )(ȳ − x̄) (mµ + nµ ) (mx̄ + nȳ) 1 m
X n
X
2 1 1 2 2 2
f (x, y) = C exp ³ ´ + − x + y
σ2 1 + 1 (m + n)σ 2 2σ 2 i=1 i j=1 j
m n
We take
(µ2 − µ1 )
θ= ³
1 1
´, U = Ȳ − X̄,
σ2 m
+ n
à !
m
X n
X
(mµ1 + nµ2 ) −1
ν= , , T = mX̄ + nȲ , Xi2 + Yj2 .
(m + n)σ 2 2σ 2 i=1 j=1
88
Clearly, the numerator V1 ∼ N (0, 1) and denominator V2 ∼ χ2m+n−2 and they are inde-
P P
pendent since X̄, Ȳ , ni=1 (Xi − X̄)2 and nj=1 (Yj − Ȳ )2 are independent. Therefore, we
get V ∼ tm+n−2 , which does not depend on any parameters, and hence is ancillary and
independent of T , Now we’ll express V in terms of U and T . From
U = Ȳ − X̄, and T1 = mX̄ + nȲ ,
we have
T1 − nU T1 + mU
X̄ = , Ȳ = .
m+n m+n
So
m(X̄)2 + n(Ȳ )2
1 h i
2 2 2 2 2 2
= (mT 1 − 2mnT 1 U + mn U ) + (nT1 + 2mnT 1 U + m nU )
(m + n)2
T12 mn
= + U2
(m + n) (m + n)
89
Remark 6.2.1 This part is similar to part (4) as in the one-sample problem. See Lehmann
(1986, p203).
Ȳ − X̄
W = r 2
Pm 2 Pn 2 (mX̄+nȲ )
i=1 Xi + j=1 Yj − m+n
((Ȳ − µ) − (X̄ − µ))
= r 2
Pm 2 Pn 2 2 (mX̄+nȲ )
i=1 Xi + j=1 Yj − (m + n)µ − m+n
+ (m + n)µ2
((Ȳ − µ) − (X̄ − µ))/σ
= s
2
Pm Pn (m(X̄−µ)+n(Ȳ −µ))
i=1 (Xi − µ)2 /σ 2 + j=1 (Yj − µ)2 /σ 2 − (m+n)σ 2
Ȳ 0 − X̄ 0
= r 2
Pm 02 Pn 02 (mX̄ 0 +nȲ 0 )
i=1 Xi + j=1 Yj − m+n
where Xi0 ∼ N (0, 1) and Yj0 ∼ N (0, 1) and they are all independent of each other. Note
that
−(Ȳ 0 − X̄ 0 )
−W = r 2
Pm 02 Pn 02 (mX̄ 0 +nȲ 0 )
i=1 Xi + j=1 Yj − m+n
00 00
Ȳ − X̄
= r 2
Pm 00 2 Pn 00 2 (mX̄ 00 +nȲ 00 )
i=1 Xi + j=1 Yj − m+n
where Xi00 = −Xi0 ∼ N (0, 1) and Yj00 = −Yj0 ∼ N (0, 1) and they are all independent of
each other. Clearly, W has a symmetrical distribution.
On the other hand, we write
which also has a symmetric distribution since W =d −W . This completes the proof of
the lemma.
90
6.2.6 Application 3: Testing for independence in the bivariate
normal family.
Eg. Suppose that (X1 , Y1 ), ..., (Xn , Yn ) are iid with pdf
( )
1 (x − µ1 )2 ρ(x − µ1 )(y − µ2 ) (y − µ2 )2
f (x, y) = √ exp − + −
2πσ1 σ2 1 − ρ2 2σ12 (1 − ρ2 ) σ1 σ2 (1 − ρ2 ) 2σ22 (1 − ρ2 )
So we can take
ρ
θ= , ν = (ν1 , ..., ν4 ),
σ1 σ2 (1 − ρ2 )
X ³X X X X ´
U= Xi Yi , T = Xi , Yi , Xi2 , Yi2 .
Therefore, testing
H0 : ρ = 0, versus H1 : ρ 6= 0
is equivalent to
H0 : θ = 0, versus H1 : θ 6= 0.
Given θ = 0, T is complete and sufficient for ν. Define
P
(Xi − X̄)(Yi − Ȳ )
V = ρ̂ = ³P ´1/2 ³P ´1/2
(Xi − X̄)2 (Yi − Ȳi )2
P P P
Xi Yi − n−1 ( Xi )( Yi )
= P P 1/2 P P 1/2
( Xi2 − n−1 (Xi ( )2 ) Yi2 − n−1 ( Yi )2 )
−1
U − n T1 T2
= 1/2 1/2
(T3 − n−1 T11 ) (T4 − n−1 T22 )
Clearly, in the above expression, if we replace Xi and Yj by (Xi − µ1 )/σ1 and (Yj − µ2 )/σ2
which are distributed as N (0, 1), then the value of V remains the same, therefore, V is
ancillary. By Basu’s theorem, V is independent of T . Furthermore, V is linear in U . So
a UMPU test is
Eθ0 =0 {ψ(V )} = α,
Eθ0 =0 {V ψ(V )} = αEθ0 =0 [V ] .
91
Define √
n − 2V
W = √ ,
1−V2
which is strictly increasing in v. Under H0 : ρ = 0, it can be shown that
So a UMPU test is
92
6.2.7 Application 4: Regression.
Theorem 6.2.4 Let (x1 , Y1 ), ..., (xn , Yn ) follow the linear regression
Yi = α + βxi + ²i , i = 1, ..., n.
where ²i are iid N (0, σ 2 ). Define ρ = cα + dβ for some known constants c and d. Find a
UMPU test for testing
H0 : ρ = ρ0 , versus H1 : ρ 6= ρ0 .
where r
X (xi − x̄)
γ = (α + β x̄), δ=β (xi − x̄)2 vi = qP
(xi − x̄)2
Then we have
δ x̄
β = qP , α = γ − δ qP
(xi − x̄)2 (xi − x̄)2
and ρ = cα + dβ = aγ + bδ. Note also that
X X
vi = 0, vi2 = 1.
So we can take
δ −1 γ
θ= , ν1 = ν2 = ,
σ2 2σ 2 σ2
X X X
U= v i Yi , T1 = Yi2 T2 = Yi .
.....................................
93
6.3 The LSE in linear models
One of the most useful statistical models for non-iid data in applications is the following
linear model
where Xi is the ith response random variable, β is a p-vector of unknown parameters with
p ≤ n, Zi is the ith p-vector of (non-random) covariates, and εi ’s are random errors.
Let X = (X1 , ..., Xn ), ε = (ε1 , ..., εn ) and Z be the n × p matrix whose ith row is Zi ,
i = 1, ...n. Then, we can write (3.9) as (X1 , ..., Xn ) = (β τ Z1τ , ..., βZnτ ) + (ε1 , ..., εn ), or it
can be put into the following matrix format
X = βZ τ + ε.
One of the most commonly used estimator is the least square estimator (LSE) is defined
to be β̂ such that ° °2
° ° 2
°X − β̂Z τ ° = minp kX − bZ τ k .
β∈R
(i). If the rank of Z is p (i.e., full rank), then Z τ Z is of full rank p, then there
is a unique LSE given by
β̂ = XZ(Z τ Z)−1 .
(ii). If the rank of Z is < p (i.e., not of full rank), then Z τ Z is not of full rank,
then there are infinitely many LSE’s of β, all of which take the form
β̂ = XZ(Z τ Z)− ,
(Z τ Z)(Z τ Z)− (Z τ Z) = Z τ Z
94
6.4 Summary
This chapter is concerned with UMPU tests again for multi-parameter exponential fam-
ilies. With further conditions, we could reduce the UMPU tests in conditional form to
unconditional form. The results have been applied to some well known problems studied
in elementary statistics courses.
6.4.1 Exercises
1. Let X1 , . . . , Xn be iid from the gamma distribution Γ(α, γ) with unknown α and γ,
whose p.d.f. is
1
f (x) = xα−1 e−x/γ I{x > 0}.
Γ(α)γ α
(i). For testing H0 : α ≤ α0 versus H1 : α > α0 and H0 : α = α0 versus
H1 : α 6= α0 , show that
´ there exist UMPU tests whose rejections are based
Qn ³
on W = i=1 Xi /X̄ .
(ii). For testing H0 : γ ≤ γ0 versus H1 : γ > γ0 , show that a UMPU test
P Q
rejects H0 when ni=1 Xi > C ( ni=1 Xi ). (Here, C(t) is a function of t.)
(i). Show that there exist a UMPU test for testing H0 : p1 ≤ p2 versus
H1 : p1 > p2 .
(ii). Determine the conditional distribution PU |T =t when r1 = r2 = 1.
95
Chapter 7
UMP, UMPU (and UMPI) tests often do not exist in a particular problem. In this chapter,
we shall introduce other tests. These tests may not be optimal, but they are very general
methods, easy to use, and have intuitive appeal. They often coincide with optimal tests
(UMP, UMPU tests). They play similar role to the MLE in the estimation theory.
Throughout the chapter, we assume that a sample
X ∼ F ∈ F = {Fθ : θ ∈ Θ},
H0 : θ ∈ Θ 0 , versus H1 : θ ∈ Θ 1 , (0.1)
where Θ0 ∪ Θ1 = Θ and Θ0 ∩ Θ1 = 0.
Remark 7.1.1 .
96
(iii). If the distribution of λ(X) is continuous, then we can find a test of size
α exactly. If its distribution is discrete, this may not be achieved. However,
we can use randomization to achieve the exact size.
(iv). LRT is a method to derive tests, however, no optimal properties are
known. Fortunately, in many situations, LRT’s are optimal (either UMP,
UMPU or UMPI etc.) as illustrated in the next section.
(v). LRT in hypothesis testing plays a similar role as MLE in parametric
estimation. Both are generally applicable, may not always have good finite-
sample properties but they do have good asymptotic properties.
97
7.2 LRT with no nuisance parameters.
We shall show that when there is no nuisance parameters, LRT tests are often optimal
(either UMP, UMPU or UMPI). But we first start with a couple of examples.
supθ∈Θ0 l(θ)
λ(x) ≡
supθ∈Θ l(θ)
P
exp{− 12 (xi − θ0 )2 }
= P
exp{− 12 (xi − x̄)2 }
P
exp{− 21 (xi − x̄)2 − n2 (x̄ − θ0 )2 }
= P
exp{− 12 (xi − x̄)2 }
½ ¾
n
= exp − (x̄ − θ0 )2
2
Thus λ(x) < C iff (x̄ − θ0 )2 > C1 iff |x̄ − θ0 | > C0 , where C0 is determined from
³ ´ ³√ √ ´ √
sup Pθ |X̄ − θ0 | > C0 = Pθ0 n|X̄ − θ0 | > C0 n = 1 − Φ(C0 n) = α,
θ∈Θ0
√
that is, C0 = Φ−1 (1 − α2 )/ n.
Remark 7.2.1 The LRT test is the same as the UMPU test.
98
Example 2. (Binomial distribution) Let X1 , ..., Xn ∼ Bin(1, θ) with 0 ≤ θ ≤ 1. Find
the LRT of H0 : θ ≤ θ0 versus H1 : θ > θ0 .
P
The global MLE for θ is θ̂ = xi /n = x̄ and
l(θ̂) = x̄nx̄ (1 − x̄)n[1−x̄] .
Easy to check that
l0 (θ) = θnx̄−1 (1 − θ)n[1−x̄]−1 n[x̄ − θ].
That is, l0 (θ) > 0, = 0, < 0 depending on θ < x̄, = x̄, > x̄. So l(θ) first increases, achieves
its maximum at x̄, and then decreases. It follows from the plot that
supθ∈Θ0 l(θ)
λ(x) ≡
supθ∈Θ l(θ)
= 1 if θ̂ ≤ θ0
θ0nx̄ (1 − θ0 )n[1−x̄]
if θ̂ > θ0
x̄nx̄ (1 − x̄)n[1−x̄]
Denote
1 θnx (1 − θ0 )n[1−x]
g(x) = log 0nx
n x (1 − x)n[1−x]
= x log θ0 + (1 − x) log(1 − θ0 ) − x log x − (1 − x) log(1 − x).
So for x > θ0 , we have
à ! µ ¶
0 θ0 x
g (x) = log θ0 − log(1 − θ0 ) − log x + log(1 − x) = log − log < 0.
1 − θ0 1−x
Thus, g(x) is decreasing in x. Hence for θ̂ ≤ θ0 , we see that λ(x) is also decreasing in θ̂.
P
Thus λ(x) < C iff Xi > C0 , where
³X ´ ³X ´
sup Pθ Xi > C0 = Pθ0 Xi > C0 = α, (2.2)
θ≤θ0
Remark 7.2.2 The LRT test is also the same as the UMPU test.
Remark 7.2.3 The first equality in (2.2) follows from the fact that
à !
³X ´ n
X n k
g(θ) = sup Pθ Xi > C0 = θ (1 − θ)n−k
θ≤θ0 k=C0 +1 k
99
7.2.2 LRT in one-parameter exponential family.
We have seen from the last examples that LRT are equivalent to UMP or UMPU tests.
In fact, it is true in general for one-parameter exponential family.
there exists LRT whose rejection region is C1 < T (X) < C2 , which is the same
form as that of UMP test.
(c). For testing
there exists LRT whose rejection region is T (X) < C1 or T (X) > C2 , which
is the same form as that of UMPU test.
(c). For testing
H0 : θ = θ0 , versus H1 : θ 6= θ0 ,
there exists LRT whose rejection region is T (X) < C1 or T (X) > C2 , which
is the same form as that of UMPU test.
Proof. Since η(θ) is a strictly increasing function of θ, so the hypotheses are equivalent
to testing η. For instance,
H0 : θ ≤ θ0 , versus H1 : θ > θ0
is the same as
A reparametrization results in
100
Setting the first derivative to zero, we get the MLE η̂ satisfying
ξ 0 (η̂) = T (x) ≡ t, thus η̂ = η̂(T (x)) = η̂(t).
and
dt
= ξ 00 (η̂) > 0.
dη̂
So l(η) first increases strictly up to η̂ and then decreases strictly after that. Thus,
supθ∈Θ0 l(η)
λ(x) ≡
supθ∈Θ l(η)
= 1 if η̂ ≤ η0
l(η0 )
if η̂ > η0
l(η̂)
= 1 if η̂ ≤ η0
eg(t|η0 ) if η̂ > η0 ,
where
l(η0 )
g(t|η0 ) = g(T (x)|η0 ) = log = log l(η0 ) − log l(η̂)
l(η̂)
= (η0 − η̂)t − [ξ(η0 ) − ξ(η̂)] .
So
dg(t) dη̂
g 0 (t|η0 ) =
dη̂ dt
" #
dη̂ 0 dt
= −t + ξ (η̂) + (η0 − η̂)
dt dη̂
" #
dη̂ dt
= (η0 − η̂)
dt dη̂
= η0 − η̂.
dη̂
g 00 (t|η0 ) = − < 0.
dt
In particular, for η0 < η̂, we have g 0 (t|η0 ) < 0, which means that g(t|η0 ) is strictly
decreasing in t. Hence for η̂ ≤ η0 , we see that λ(x) is also strictly decreasing in η̂. Thus
we reject H0 iff λ(x) < C iff T (x) > C0 , which is the rejection region for the UMP test.
(b). The proof is similar to that in (a). Note that
supθ∈Θ0 l(η)
λ(x) ≡
supθ∈Θ l(η)
= 1 if η̂ < η1 or η̂ > η2
max{l(η1 ), l(η2 )}
if η1 ≤ η̂ ≤ η2
l(η̂)
= 1 if η̂ < η1 or η̂ > η2
( )
l(η1 ) l(η2 )
max , if η1 ≤ η̂ ≤ η2
l(η̂) l(η̂)
= 1 if η̂ < η1 or η̂ > η2
exp {max{g(t|η1 ), g(t|η2 )}} if η1 ≤ η̂ ≤ η2 .
101
Similarly to the first part of the proof, if η1 ≤ η̂ ≤ η2 , we see that g(t|η1 ) is strictly
decreasing in t and g(t|η2 ) is strictly increasing in t.
Hence we reject H0 iff λ(x) < C iff g(t|η1 ) < C10 and g(t|η2 ) < C20 iff C1 < T (x) < C2 .
The proof is complete.
Remark 7.2.4 .
(i). Although the rejection regions for LRT tests and UMP (or UMPU etc)
tests are of the same form, however, for tests of size α for both, then they
may not necessarily agree on their critical points (i.e., Ci ’s may be different
for LRT and UMP or UMPU tests.); see the next example.
(ii). In some situations, LRT and UMP (or UMPU) tests of size α are the
same. For instance, for the one-sided tests, they are always the same. For
two-sided tests, they are the same if the test statistic is symmetric since the
two side conditions reduce to one in this case.
Example. [Uniform distribution] Let X1 , ..., Xn ∼ U nif (0, θ) with θ > 0. Recall the
UMP test of size α for testing H0 : θ = θ0 versus H1 : θ 6= θ0 is
ψ(x) = I{x(n) > θ0 , or x(n) ≤ θ0 α1/n }
Show that this is also the LRT of size α.
Solution. The likelihood is l(θ) = f (x) = θ−n I{x(n) < θ}. The unrestricted MLE for θ
is θ̂ = x(n) and
supθ∈Θ0 l(θ) l(θ0 )
λ(x) ≡ =
supθ∈Θ l(θ) supθ∈Θ l(θ)
−n µ ¶
θ0 I{x(n) < θ0 } x(n) n
= = I{x(n) < θ0 }
x−n
(n) θ0
µ ¶
x(n) n
= if x(n) < θ0
θ0
0 if x(n) > θ0 .
From the plot of λ(x) versus x(n) , we see that the rejection region is λ(x) < C iff X(n) > θ0
or X(n) /θ0 < C0 , where C0 = α1/n in order to make the test of size α.
102
(a). Find an LRT of size α for testing H0 : θ ≤ θ0 versus H1 : θ > θ0 .
(b). Is it the same as the UMP test?
= 1 if x(1) < θ0 .
en(θ0 −x(1) ) if x(1) ≥ θ0 .
Note that λ(x) is a continuous and nonincreasing function of x(1) . (In fact, it stays flat
for x(1) < θ0 and graduately decreases to zero.) Thus λ(x) < C iff x(1) > C0 , where C0 is
decided by the size of the test.
(b). As an exercise. (Check.)
103
7.4 LRT with nuisance parameters
LRT has been seen to coincide with optimal tests in many situations when there are no
nuisance parameters. However, LRT proves even more useful when there are nuisance
parameters. We also start with some examples.
104
Remark 7.4.1 .
(a). The LRT test is exactly the same as the UMPU test.
(b). In this case, the exact distribution of λ(X) is known. Its approximate
distribution can be found as follows.
µ h√ i2 ¶
−2 log λ(X) = n log 1 + n(X̄ − µ0 )/S /(n − 1)
n h√ i2
≈ n(X̄ − µ0 )/S + op (1)
n−1
∼approx χ21 ,
(a). Find the LRT of size α for testing H0 : σ12 = σ22 versus H1 : σ12 6= σ22 . Is it
the same as the UMPU test?
(b). Find the LRT of size α for testing H0 : µ1 = µ2 versus H1 : µ1 6= µ2 ,
assuming that σ12 = σ22 = σ 2 . Is it the same as the UMPU test?
Solution.
(a). See homework 5 for solutions. It turns out that LRT and UMPU test are the
same.
(b). The likelihood is
1
1 X m Xn
l(θ) = f (x)f (y) = m+n exp − 2 (xi − µ1 )2 + (yi − µ2 )2 .
(2πσ 2 ) 2 2σ
i=1 j=1
Here θ = (µ1 , µ2 , σ 2 ).
³ ´
The unrestricted MLE’s for θ is θ̂ = µ̂1 , µ̂2 , σ̂p2 = (X̄, Ȳ , σ̂p2 ), where
m n
1 X X mσ̂12 + nσ̂22
σ̂p2 = (xi − x̄)2 + (yi − ȳ)2 = (pooled sample variance)
m + n i=1 j=1 m+n
and
( )
1 hX X i
sup l(θ) = l(θ̂) = (2πσ̂p2 )−(m+n)/2 exp − 2 (xi − x̄)2 + (yi − x̄)2
θ∈Θ 2σ̂p
½ ¾
m+n
= (2πσ̂p2 )−(m+n)/2 exp − .
2
Under H0 : µ1 = µ2 , the restricted MLE’s for θ0 = (µ, µ, σ 2 ) is θ̂0 = (µ̂0 , µ̂0 , σ̂02 ), where
P P
mx̄ + nȳ (xi − µ̂0 )2 + (yi − µ̂0 )2
µ̂0 = , σ̂02 = .
m+n m+n
105
Thus, ½ ¾
m+n
sup l(θ) = l(θ̂0 ) = (2πσ̂02 )−(m+n)/2 exp −
θ∈Θ0 2
Note that
X X
(m + n)σ̂02 = (xi − µ̂0 )2 + (yi − µ̂0 )2
X X
= (xi − x̄)2 + (yi − ȳ)2 + m (x̄ − µ̂0 )2 + n (ȳ − µ̂0 )2
= (m + n)σ̂p2 + m (x̄ − µ̂0 )2 + n (ȳ − µ̂0 )2
à !2 à !2
n(ȳ − x̄) m(ȳ − x̄)
= (m + n)σ̂p2
+m +n
m+n m+n
2
mn 2 nm2
= (m + n)σ̂p2 + (ȳ − x̄) + (ȳ − x̄)2
(m + n)2 (m + n)2
mn
= (m + n)σ̂p2 + (ȳ − x̄)2
(m + n)
so
mn
σ̂02 = σ̂p2 + (ȳ − x̄)2
(m + n)2
Therefore,
à 2 !(m+n)/2
supθ∈Θ0 l(θ) σ̂ p
λ(x) ≡ =
supθ∈Θ l(θ) σ̂02
(m+n)/2
σ̂p2
=
σ̂p2 + mn
(m+n)2
(ȳ − x̄)2
(m+n)/2
1
=
1+ mn
(m+n)2
(ȳ − x̄)2 /σ̂p2
à !n/2
1
= 2
1 + [(ȳ − x̄) / [σ̂p (1/m + 1/n)1/2 ]]
à !n/2
1
= 2 ,
1 + [(ȳ − x̄) / [Sp (1/m + 1/n)1/2 ]] (m + n − 2)−1
Remark 7.4.2 .
(a). The LRT test is exactly the same as the UMPU test.
(b). In this case, the exact distribution of λ(X) is known. As in the one-sample
case, we can also find its approximate distribution
106
Example. (Regression problem). One-sample and two-sample problems considered
above are special cases of the general regression problems. For a general solution, see
Example 6.21 in Shao (1999), p382.
The unrestricted MLE for θ is θ̂ = (µ̂1 , µ̂2 , σ̂12 , σ̂22 , ρ̂) where
1X 1X
µ̂1 = x̄, µ̂2 = ȳ, σ̂12 = (xi − x̄)2 , σ̂22 = (yi − ȳ)2 ,
n n
P P
(xi − x̄)(yi − ȳ) n−1 (xi − x̄)(yi − ȳ)
ρ̂ = qP P = .
(xi − x̄)2 (yi − ȳ)2 σ̂1 σ̂2
and
( "
1 1 X µ xi − x̄ ¶2 X µ yi − ȳ ¶2
l(θ̂) = n exp − +
[2πσ̂1 σ̂2 (1 − ρ̂2 )1/2 ] 2(1 − ρ̂2 ) σ̂1 σ̂2
µ
X xi − x̄ ¶ µ ¶¸¾
yi − ȳ
−2ρ̂
σ̂1 σ̂2
( )
1 1 h i
= 2 1/2 n exp − 2n(1 − ρ̂2 )
[2πσ̂1 σ̂2 (1 − ρ̂ ) ] 2(1 − ρ̂2 )
e−n
= n.
[2πσ̂1 σ̂2 (1 − ρ̂2 )1/2 ]
On the other hand, the restricted MLE for θ when ρ = 0 is θ̂0 = (µ̂1 , µ̂2 , σ̂12 , σ̂22 ) defined
as before and
( " µ ¶2 #)
1 1 X xi − x̄ X µ yi − ȳ ¶2
l(θ̂0 ) = n exp − +
[2πσ̂1 σ̂2 ] 2 σ̂1 σ̂2
½ ¾
1 2n
= exp −
[2πσ̂1 σ̂2 ]n 2
−n
e
= .
[2πσ̂1 σ̂2 ]n
Thus
supθ∈Θ0 l(θ) ³ ´n/2
λ(x) ≡ = 1 − ρ̂2 .
supθ∈Θ l(θ)
107
Thus λ(x) < C iff |ρ̂| > C1 iff |T | > C0 , where
√
n − 2ρ̂
T ≡ √ .
1 − ρ̂2
Remark 7.4.3 .
(a). The LRT test is exactly the same as the UMPU test.
(b). In this case, the exact distribution of λ(X) is known. As in the one- and
two- sample case, we can also find its approximate distribution
³ ´n/2 1
−2 log λ(X) = −2 log 1 − ρ̂2 = n log
1 − ρ̂2
à ! Ã√ !2
ρ̂2 nρ̂2 n − 2ρ̂ n
= n log 1 + 2
= 2
+ .... = √ + ....
1 − ρ̂ 1 − ρ̂ 1 − ρ̂2 n−2
∼approx χ21 .
108
Example. (LRT and UMPU are of the same form but not identical.) Let
X1 , ..., Xn be a random sample from a normal distribution with unknown parameters µ
and σ 2 .
(i) Find the LRT and UMPU test of size α for testing H0 : σ 2 ≤ σ02 versus
H1 : σ 2 > σ02 . Are they the same?
(ii) Find the LRT and UMPU test of size α for testing H0 : σ 2 = σ02 versus
H1 : σ 2 6= σ02 . Are they the same?
Solution. (i). It can be shown that the LRT is exact the same as the UMP test, that is,
we reject H0 iff (n − 1)S 2 /σ02 > χ2n−1 (1 − α).
On the other hand, under H0 : σ 2 = σ02 , the restricted MLE for µ is still X̄. Denote
θ̂0 = (X̄, σ02 ), then we have
½ ¾ ( )
1 1X 2 2 1 n σ̂ 2
l(θ̂0 ) = exp − (x i − x̄) /σ 0 = exp −
(2πσ02 )n/2 2 (2πσ02 )n/2 2 σ02
Therefore,
109
In summary, the LRT test of size α is to reject H0 iff T < C1 or T > C2 , where C1
and C2 are determined from the following two equations
Z C2
χ2n−1 (v)dv = 1 −α (4.5)
C1
C1n e−C1 = C2n e−C2 . (4.6)
Next, we shall look at the UMPU test. Recall that the UMPU test of size α is to reject
H0 iff T < D1 or T > D2 , where D1 and D2 are determined from Eθ0 [1−ψ(T )] = 1−α and
Eθ0 T [1−ψ(T )] = (1−α)Eθ0 T (or Eθ0 [T −ET ][1−ψ(T )] = Eθ0 [T −(n−1)][1−ψ(T )] = 0),
that is,
Z D2
χ2n−1 (v)dv = 1 − α
D1
Z D2
[v − (n − 1)]χ2n−1 (v)dv = 0.
D1
1 h i¯D2
n−1 v ¯
= ³
n−1
´ n−1 −2v 2 e− 2 ¯
Γ 2 2 D1
2
· ¸
−2 n−1
−
D2 n−1
−
D1
= ³
n−1
´ n−1 D2 2
e 2 − D1 2
e 2
Γ 2
2 2
n−1 D2 n−1 D1
which is equivalent to D2 2 e− 2 = D1 2 e− 2 or
In summary, the UMPU test of size α is to reject H0 iff T < D1 or T > D2 , where D1
and D2 are determined from
Z D2
χ2n−1 (v)dv = 1 − α (4.7)
D1
D1n−1 e−D1 = D2n−1 e−D2 . (4.8)
Comparing (4.5)-(4.6) and (4.7)-(4.8), we see that the LRT and UMPU test can not
be the same unless the size α = 1. This is because, if C1 = D1 and C2 = D2 , then the
ratio of (4.6) and (4.8) results in C1 = C2 = D1 = D2 , which in turn implies from (4.5)
or (4.7) that 1 − α = 0.
110
7.5 Bad performance of LRT
Like MLE in estimation, LRT has good large sample asymptotic properties. However, for
fixed sample, it can have bad performance.
Example. (X.R. Chen, p338.) Let X take on 5 values 0, ±1, ±2 with distribution
1 − θ1
Pθ (X = 0) = α
1−α
µ ¶
1 1 − θ1
Pθ (X = 1) = −α
2 1−α
Pθ (X = −1) = Pθ (X = 1)
Pθ (X = 2) = θ1 θ2
Pθ (X = −2) = θ1 (1 − θ2 )
1
where Θ = ((θ1 , θ2 ) : 0 ≤ θ1 ≤ α, 0 ≤ θ2 ≤ 1) and 0 < α < 2
and α is fixed. Consider
testing
1
H0 : θ1 = α, θ2 = .
2
Find the LRT of size α and show that it is biased.
Solution. The likelihood is l(θ) = l(θ1 , θ2 ) = Pθ (X = x) for x = 0, ±1, ±2. The
unrestricted MLE for θ is θ̂ = (θ̂1 , θ̂2 ),
θ̂ = (θ̂1 , θ̂2 ) = (θ̂1 = 0, θ̂2 free) if x = 0
(θ̂1 = 0, θ̂2 free) if x = 1
(θ̂1 = 0, θ̂2 free) if x = −1
(θ̂1 = α, θ̂2 = 1) if x = 2
(θ̂1 = α, θ̂2 = 0) if x = −2,
and hence
α
l(θ̂) = if x = 0
1−α
1/2 − α
if x = 1
1−α
1/2 − α
if x = −1
1−α
α if x = 2
α if x = −2
α
= if x = 0,
1−α
1/2 − α
if x = ±1,
1−α
α if x = ±2.
111
Thus
supθ∈Θ0 l(θ) l(θ0 )
λ(x) ≡ =
supθ∈Θ l(θ) l(θ̂)
= 1−α if x = 0, ±1
1/2 if x = ±2
Therefore, the test is actually biased since P ower(θ) < α. In other words, the LRT test
is not even as good as the trivial test φ(x) = α.
112
7.6 Asymptotic χ2 approximation of the LRT.
7.6.1 Why asymptotics?
As seen earlier, sometimes it is very difficult or impossible to find the exact distribution
of λ(X). So approximations in these cases become necessary.
P
Example. Let θ = (p1 , ..., p5 ), where 5i=1 pi = 1 and pi ≥ 0, i = 1, ..., 5. Suppose
X1 , ..., Xn are iid discrete r.v.’s and Pθ (Xi = j) = pj , j = 1, ..., 5. Find an LRT of size α
for testing
Solution. Let N1 , · · · , N5 be the total number of observations in the five cells. Therefore,
P5
i=1 N
Pi
= n. Note that the joint distribution of (N1 , · · · , N5 ) is multinomial(n, p1 , · · · , p5 )
with pi = 1, but the joint distribution of (X1 , · · · , Xn ) is not (its form will be l(θ))
given below.
Given the observation x = (x1 , ..., xn ), the likelihood is
l(θ) = Pθ (X = x) = pN N5
1 ...p5 .
1
Set
∂L(θ, λ)
= Ni /pi − λ = 0, i = 1, ..., 5 (6.9)
∂pi
X5
∂L(θ, λ)
=1− pi = 0. (6.10)
∂λ i=1
P P
From (6.9), Ni /λ = pi , so that 5i=1 pi = 5i=1 Ni /λ = n/λ = 1. That is λ = n. Therefore,
the solutions are p̂i = Ni /n. That is, the MLE of θ = (p1 , ..., p5 ) is
and
l(θ̂) = p̂N N5 N1 N5
1 ...p̂5 = (N1 /n) ...(N5 /n) .
1
113
Define Lagrange multiplier
Set
∂L(θ, λ)
= (N1 + N2 + N3 )/p1 − 3λ = 0, (6.11)
∂p1
∂L(θ, λ)
= (N4 + N5 )/p4 − 2λ = 0, (6.12)
∂p4
∂L(θ, λ)
= 3p1 + 2p4 − 1 = 0. (6.13)
∂λ
From (6.11), (6.12), we get 3p1 = (N1 + N2 + N3 )/λ and 2p4 = (N4 + N5 )/λ, which
by (6.13) leads to λ = N . Therefore, the solutions are p̂1 = (N1 + N2 + N3 )/(3n) and
p̂4 = (N4 + N5 )/(2n). That is, the MLE of θ0 is
Remark 7.6.1 Here, the exact distribution of λ(X) (needed for determined the critical
value C) is impossible to find out. Even if we wish to use Monte Carlo simulation to do
this, we can not generate (N1 , ..., N5 ) under H0 since the pi ’s are not specified. However,
we shall see later that
−2 log λ(X) ∼approx χ23
since under H0 , we have three constraints: p1 = p2 , p2 = p3 and p4 = p5 .
A slightly different example is given next, where we shall try to find the asymptotic
distribution of LRT.
P
Example. Let θ = (p1 , ..., pr ), where ri=1 pi = 1 and pi ≥ 0, i = 1, ..., r. Suppose
X1 , ..., Xn are iid discrete r.v.’s and Pθ (Xi = j) = pi , j = 1, ..., r. Find an LRT of size α
for testing
H0 : p1 = pi0 i = 1, ..., r.
H1 : H0 is not true,
114
where θ0 = (p10 , ..., pr0 ) is known probability vector.
Remark. Note that in this example, the exact distribution of LRT can be found by
Monte Carlo simulation, unlike the last example.
115
7.6.2 Review: Asymptotic properties of MLE
The asymptotic distribution of the LRT critically depends on the properties of MLE’s.
So first, we shall list some regularity conditions and results concerning MLE.
Theorem 7.6.1 Let F = {Fθ : θ ∈ Θ}, where Θ is an open set in Rk . Assume that
3
(1). For each θ ∈ Θ, ∂ log fθ (x)
∂θ3
exists for all x, and also there exists a function
H(x) ≥ 0 (possibly depending on θ) such that for θ0 ∈ N (θ, ²) = {θ0 : ||θ0 −θ|| <
²}, ° °
° ∂ 3 log f 0 (x) °
° θ °
° ° ≤ H(x), Eθ0 H(X1 ) < ∞.
° ∂θ 03 °
Let X1 , ..., Xn ∼ Fθ . Then with probability 1, the likelihood equations admit a sequence of
solutions {θ̂n } satisfying
1). Strong consistency: θ̂n → θ with probability 1.
2). Asymptotic efficiency: θ̂n → N (θ, [nIX1 (θ)]−1 ).
Remark 7.6.2 .
∂ log fθ (x)
(1). Condition 1 ensures that ∂θ
, for any x, has a Taylor’s expansion as
a function of θ.
(2). Condition 2 means that
Z Z
∂ log fθ (x)
fθ (x)dx, dx
∂θ
can be differentiated with respect to θ under the integral sign. That is, the
integration and differentiation can be interchanged.
(3). A sufficient condition for Condition 2 is the following:
For each θ ∈ Θ, there exists functions g(x), h(x), and H(x) (possibly depend-
ing on θ) such that for θ0 ∈ N (θ, ²) = {θ0 : ||θ0 − θ|| < ²},
° ° ° ° ° °
° ∂f 0 (x) ° ° ∂ 2 f 0 (x) ° ° ∂ 3 log f 0 (x) °
° θ ° ° θ ° ° θ °
° ° ≤ g(x), ° ° ≤ h(x), ° ° ≤ H(x),
° ∂θ 0 ° ° ∂θ 0 2 ° ° ∂θ 0 3 °
∂ log fθ (x)
(4). Condition 3 ensures that the covariance matrix of ∂θ
is finite and
positive definite.
116
7.6.3 Formulation of the problem.
Without any constraints, all the k components of parameter θ = (θ1 , ..., θk ) ∈ Θ ∈ Rk are
free to change, and so it has k degrees of freedom.
For the null hypothesis H0 : θ ∈ Θ0 ∈ Rk , suppose that we have r constraints on the
parameter θ, then only k − r components of θ = (θ1 , ..., θk ) are free to change, and so it
has k − r degrees of freedom. Without loss of generality, we denote these k − r dimension
parameter by ν = (ν1 , ..., νk−r ).
H0 : θ = g(ν),
g : Rk−r → Rk
Solution.
(a). Here, Θ = R2 , k = 2 and r = 1, and θ2 is the only free changing parameter. Then
we can take ν = θ2 ∈ Rk−r = R1 , and
θ1 = a0 ≡ g1 (θ2 ) = g1 (ν),
θ2 = θ2 ≡ g2 (θ2 ) = g2 (ν)
(b). Here, Θ = R3 , k = 3 and r = 1, and θ2 , θ3 are the two free changing parameters.
Then we can take ν = (θ2 , θ3 ) ∈ Rk−r = R2 , and
θ1 = b0 ≡ g1 (θ2 , θ3 ) = g1 (ν),
θ2 = θ2 ≡ g2 (θ2 , θ3 ) = g2 (ν),
θ3 = θ3 ≡ g3 (θ2 , θ3 ) = g3 (ν).
117
(c). Here, Θ = R3 , k = 3 and r = 1, and θ2 , θ3 are the two free changing parameters.
Then we can take ν = (θ2 , θ3 ) ∈ Rk−r = R2 , and
θ1 = θ2 ≡ g1 (θ2 , θ3 ) = g1 (ν),
θ2 = θ2 ≡ g2 (θ2 , θ3 ) = g2 (ν),
θ3 = θ3 ≡ g3 (θ2 , θ3 ) = g3 (ν).
b0 + a0
θ1 = ≡ g1 (θ3 ) = g1 (ν),
2
b0 − a 0
θ2 = ≡ g2 (θ3 ) = g2 (ν),
2
θ3 = θ3 ≡ g3 (θ3 ) = g3 (ν).
Remark 7.6.3 .
(ii). The above theorem can not deal with the composite null hypothesis like
H0 : θ1 ≤ 0.
Theorem 7.6.2 Assume that the regularity conditions in the MLE theorem hold. Assume
further that the hypothesis H0 : θ = g(ν) ∈ Rk , where ν is a (k − r)-vector of unknown
parameters and g : Rk−r → Rk is a continuously differentiable function.
where χ2r (1 − α) is the (1 − α)th quantile of χ2r , that is, P (χ2r ≤ χ2r (1 − α)) =
1 − α.
Proof. First
n
Y
l(θ) = fθ (X) = fθ (Xi )
i=1
n
X
log l(θ) = log fθ (Xi )
i=1
n
∂ log l(θ) X ∂ log fθ (Xi )
Sn (θ) ≡ =
∂θ i=1 ∂θ
118
à n n
!
X ∂ log fθ (Xi ) X ∂ log fθ (Xi )
= , ......,
i=1 ∂θ1 i=1 ∂θk
(Score function)
∂ 2 log l(θ)
Sn0 (θ) =
∂θτ ∂θ
n
X ∂ 2 log fθ (Xi ) n
X ∂ 2 log fθ (Xi )
= , ......,
i=1 ∂ 2 θ1 i=1 ∂θ1 ∂θk
..............................
..............................
n n
X ∂ 2 log fθ (Xi ) X ∂ 2 log fθ (Xi )
, ......,
i=1 ∂θk ∂θ1 i=1 ∂θk ∂θk
"Ã !τ Ã !#
∂ log l(θ) ∂ log l(θ)
In (θ) = E
∂θ ∂θ
" #
2
∂ log l(θ)
= −E τ
= −E[Sn0 (θ)]
∂θ ∂θ
(Fisher information)
From ESn (θ) = 0, var[Sn (θ)] = nI(θ), E[Sn0 (θ)] = −nI(θ), and together with CLT for
multivariate random vector, we get
n−1/2 Sn (θ) ∼approx N (0, I(θ)) ,
or equivalently,
n−1/2 Sn (θ)I −1/2 (θ) ∼approx N (0, Ik ) ,
which implies
n−1 Sn (θ)I −1 (θ)[Sn (θ)]τ ∼approx χ2k .
(i). Denote the MLE of θ by θ̂, such that supθ∈Θ l(θ) = l(θ̂). Clearly, it satisfies
Sn (θ̂) = 0. By Taylor’s expansion,
0 = Sn (θ̂) = Sn (θ) + (θ̂ − θ)Sn0 (θ) + ...
= Sn (θ) − (θ̂ − θ)[nI(θ)] + ...
So
n1/2 (θ̂ − θ)I(θ) = n−1/2 Sn (θ) + op (1) ∼approx N (0, I(θ)) ,
or
n1/2 (θ̂ − θ)I 1/2 (θ) = n−1/2 Sn (θ)I −1/2 (θ) + op (1) ∼approx N (0, Ik ) ,
which implies
n(θ̂ − θ)I(θ)(θ̂ − θ)τ ∼approx χ2k .
Now
" #
l(θ) h i
−2 log = −2 log l(θ) − log l(θ̂)
l(θ̂)
· ¸
1
= −2 Sn (θ̂)(θ − θ̂)τ + (θ − θ̂)Sn0 (θ̂)(θ − θ̂)τ + ......
2
τ
= n(θ̂ − θ)I(θ)(θ̂ − θ) + op (1)
∼approx χ2k .
119
Next under H0 : θ = g(ν), we define
à !τ
∂L(g(ν)) ∂L(g(ν)) ∂g(ν)
S̃n (ν) ≡ =
∂ν ∂θ ∂ν τ
à !τ
∂ log l(g(ν)) ∂ log l(g(ν)) ∂g(ν)
= =
∂ν ∂θ ∂ν τ
= Sn (θ)D(ν) where D(ν) = ∂g(ν)∂ν τ
(Score function for ν)
"Ã !τ Ã !#
˜ ∂ log l(g(ν)) ∂ log l(g(ν))
I(ν) = E
∂ν ∂ν
à !τ à !
∂g(ν) ∂g(ν)
= I(θ)
∂ν ∂ν
τ
= [D(ν)] I(θ)D(ν)
(Fisher information for ν)
We denote the MLE of ν under H0 by ν̂, such that supθ∈Θ0 l(θ) = supν l(g(ν)) =
l(g(ν̂)). Clearly, it satisfies S̃n (ν̂) = 0. By Taylor’s expansion,
So
³ ´
˜ = n−1/2 S̃n (ν) + op (1) ∼approx N 0, I(ν)
n1/2 (ν̂ − ν)I(ν) ˜ ,
or
n1/2 (ν̂ − ν)I˜1/2 (ν) = n−1/2 S̃n (ν)I˜−1/2 (ν) + op (1) ∼approx N (0, Ik−r ) ,
which implies
˜
n(ν̂ − ν)I(ν)(ν̂ − ν)τ ∼approx χ2k−r .
Therefore, under H0 ,
" # " #
l(θ) l(g(ν))
−2 log = −2 log = −2 [log l(g(ν)) − log l(g(ν̂))]
l(g(ν̂)) l(g(ν̂))
· ¸
τ 1 0 τ
= −2 S̃n (ν̂)(ν − ν̂) + (ν − ν̂)S̃n (ν̂)(ν − ν̂) + ......
2
= ˜ τ
n(ν̂ − ν)I(ν)(ν − ν̂) + op (1)
∼approx χ2k−r .
120
˜
= n(θ̂ − θ)I(θ)(θ̂ − θ)τ − n(ν̂ − ν)I(ν)(ν̂ − ν)τ + op (1)
= n−1 Sn (θ)I −1 (θ)[Sn (θ)]τ − n−1 S̃n (ν)I˜−1 (θ)[S̃n (ν)]τ + op (1)
= n−1 Sn (θ)I −1 (θ)[Sn (θ)]τ − n−1 Sn (θ)D(ν)I˜−1 (ν)[Sn (ν)D(ν)]τ + op (1)
n o
= n−1 Sn (θ) I −1 (θ) − D(ν)I˜−1 (ν)[D(ν)]τ [Sn (θ)]τ + op (1)
n o
= n−1 Sn (θ)I −1/2 (θ) Ik − I 1/2 (θ)D(ν)I˜−1 (ν)[D(ν)]τ I 1/2 (θ)
[Sn (θ)I −1/2 (θ)]τ + op (1)
= n−1 Sn (θ)I −1/2 (θ)A[Sn (θ)I −1/2 (θ)]τ + op (1),
where A = {Ik − H} and H = I 1/2 (θ)D(ν)I˜−1 (ν)[D(ν)]τ I 1/2 (θ), where A and H are both
symmetric. Now
H2 = HH = I 1/2 (θ)D(ν)I˜−1 (ν)[D(ν)]τ I 1/2 (θ)I 1/2 (θ)D(ν)I˜−1 (ν)[D(ν)]τ I 1/2 (θ)
= I 1/2 (θ)D(ν)I˜−1 (ν)I(ν)
˜ I˜−1 (ν)[D(ν)]τ I 1/2 (θ)
= I 1/2 (θ)D(ν)I˜−1 (ν)[D(ν)]τ I 1/2 (θ)
= H.
Hence,
A2 = (I − H)2 = I − 2H + H 2 = I − 2H + H = I − H = A.
Therefore, A (and H) is idempotent (or a projection matrix) so that its eigenvalues are
either 1’s or 0’s, and its rank (= total number of eigenvalue 1’s) is
h i
rank(A) = tr(A) = k − tr I 1/2 (θ)D(ν)I˜−1 (ν)[D(ν)]τ I 1/2 (θ)
h i
= k − tr I˜−1 (ν)[D(ν)]τ I(θ)D(ν)
h i
= k − tr I˜−1 (ν)I(ν)
˜
= k − tr [Ik−r ]
= r.
Therefore, there exists an orthonormal matrix B such that
à !
Ir 0
A=B Bτ .
0 0
Thus,
−2 log λ(X) = n−1 Sn (θ)I −1/2 (θ)A[Sn (θ)I −1/2 (θ)]τ + op (1)
à !
Ir 0
∼approx N (0, Ik )B B τ [N (0, Ik )]τ
0 0
à !
Ir 0
= [N (0, Ik )B] [N (0, Ik )B]τ
0 0
(since [N (0, Ik )B] ∼ N (0, Ik ))
à !
Ir 0
= [N (0, Ik )] [N (0, Ik )]τ
0 0
à !
τ Ir 0
= (Z1 , ..., Zk ) (Z1 , ..., Zk ) where (Z1 , ..., Zk )τ ∼ N (0, Ik )
0 0
= Z12 + ... + Zr2
∼ χ2r .
121
(ii). Following from part (i),
³ ´ ³ ´
Pθ0 −2 log λ(X) > χ2r (1 − α) ≈ Pθ0 χ2r > χ2r (1 − α) = α.
Remark 7.6.4 The power calculation by χ2 approximation can also be done by using
non-central χ2 approximation under the contiguous alternative. See Xiru Chen, p336.
122
7.7 Wald’s and Rao’s tests and their relation with
LRT
Two other tests closely related to the chi-squared tests are Wald’s test and Rao’s test.
First we note that the hypothesis
H0 : θ = g(ν),
H0 : R(θ) = 0,
Example. Find the transformation R for the following tests (considered before).
(a). θ = (θ1 , θ2 ) and H0 : θ1 = a0 , where a0 is known.
(b). θ = (θ1 , θ2 , θ3 ) and H0 : θ1 = b0 , where a0 is known.
(c). θ = (θ1 , θ2 , θ3 ) and H0 : θ1 = θ2 .
(d). θ = (θ1 , θ2 , θ3 ) and H0 : θ1 − θ2 = a0 and θ1 + θ2 = b0 , where a0 and b0
are both known.
Solution.
(a). Here, Θ = R2 , k = 2 and r = 1. Then we can take R1 (θ) = θ1 − a0 .
(b). Here, Θ = R3 , k = 3 and r = 1. Then we can take R1 (θ) = θ1 − b0 .
(c). Here, Θ = R3 , k = 3 and r = 1. Then we can take R1 (θ) = θ1 − θ2 .
(d). Here, Θ = R3 , k = 3 and r = 2. Then we can take
R1 (θ) = θ1 − θ2 − a0 , R2 (θ) = θ1 + θ2 − b0 .
Under H0 : R(θ) = 0, then R(θ̂)τ should also be close to 0. This leads to the Wald’s
test.
Definition 7.7.1 (Wald’s test, 1943) Let
n o−1
Wn = R(θ̂)τ R0 (θ̂)[In (θ̂)]−1 [R0 (θ̂)]τ [R(θ̂)],
Similarly, under H0 : θ = g(ν), then Sn (g(ν̂)) should also be close to 0. This leads to
the Rao’s test.
Definition 7.7.2 (Rao’s score test, 1947) Let
n o−1
Rn = Sn (g(ν̂)) In (g(ν̂))]−1 [Sn (g(ν̂))]τ ,
where everything has been defined before. Rao’s test rejects H0 iff Rn > C.
123
Theorem 7.7.1 Assume the regularity conditions as in the last theorem hold.
(ii). The confidence regions by using Wald’s test or Rao’s test with critical
value C = χ2r (1 − α) has an asymptotic size α.
Proof. The proof is simple. See, e.g., Shao (1998, page 836). We only prove part (i)
here. Note that R(θ) is a r × 1 vector. Now under H0 , we have
h i
0 = R(θ) = R(θ̂)r×1 + R0 (θ̂)r×k (θ − θ̂)τ + ...
r×k
Thus
h i h i
R(θ̂)r×1 = R0 (θ̂)r×k (θ̂ − θ)τ + ... ≈ R0 (θ)r×k (θ̂ − θ)τ + ...
k×1 k×1
³ ´
∼approx N 0, R0 (θ)In−1 (θ)[R (θ)]τ ,
0
where In (θ) = nI(θ)
Thus, h i−1
R(θ̂)τ R0 (θ)In−1 (θ)[R0 (θ)]τ [R(θ̂)] ∼approx χ2r ,
and hence h i−1
Wn = R(θ̂)τ R0 (θ̂)In−1 (θ̂)[R0 (θ̂)]τ [R(θ̂)] ∼approx χ2r ,
Remark 7.7.1
1. The LRT, Wald’s and Rao’s score tests are all asymptotically χ2r , and thus asymp-
totically equivalent.
2. Note that LRT requires computing both θ̂ and ν̂, Wald’s test requires computing θ̂,
and Rao’s score test requires computing ν̂. One can use which ever is easiest.
124
7.9 χ2-tests for multinominal distribution
Many tests originate from testing the probabilities in the multinomial distribution. Typ-
ically, their asymptotic distribution is χ2 .
where the single nonzero component 1 is located in the jth position if the ith trial yields
the jth outcome. Then ξ1 , ..., ξn are iid random vector of dim k and
n
X n
1X ¯
X = (X1 , ..., Xk ) = ξi , X/n = ξi = ξ.
i=1 n i=1
Pn
Theorem 7.9.1 The m.g.f. of X = i=1 ξi is
n
Y ³ ´n
MX (t) = Mξi (t) = p1 et1 + ... + pk etk .
i=1
Therefore,
∂MX (t) ³ ´n−1
= npi eti p1 et1 + ... + pk etk ,
∂ti
∂ 2 MX (t) ³ ´ ³ ´
2 ti t1 tk n−2 ti t1 tk n−2
= n(n − 1)pi e p1 e + ... + p k e + np i e p 1 e + ... + pk e ,
∂t2i
∂ 2 MX (t) ³ ´n−2
= n(n − 1)pi pj eti etj p1 et1 + ... + pk etk , i 6= j,
∂ti ∂tj
125
from which we get
¯
∂MX (t) ¯¯
EXi = ¯ = npi ,
∂ti ¯t=0
¯
∂ 2 MX (t) ¯¯
E(Xi2 ) = 2
¯ = n(n − 1)p2i + npi , i 6= j,
∂ti ¯
t=0
V ar(Xi ) = E(Xi2 ) − E(Xi )2 = n(n − 1)p2i + npi − n2 p2i = n[pi − p2i ],
¯
∂ 2 MX (t) ¯¯
E(Xi Xj ) = ¯ = n(n − 1)pi pj , i 6= j,
∂ti ∂tj ¯t=0
Cov(Xi , Xj ) = E(Xi Xj ) − E(Xi )E(Xj ) = n(n − 1)pi pj − n2 pi pj = −npi pj .
The proof is complete.
Definition.
(i). The χ2 -statistic (or Pearson’s statistic) is defined by
k k
2
X (Xi − np0i )2 X (Xi − np0i )2 1
χ = =
i=1 np0i i=1 n p0i
126
Remark 7.9.1
(1) We shall show below that Pearson’s χ2 is asymptotically χ2 distributed
with k − 1 degrees of freedom (so is the modified Pearson’s χ̃2 by the Slutsky’s
theorem). To illustrate ideas, let us look at the special case k = 2 for instance.
Then
2
X (Xi − np0i )2
χ2 =
i=1 np0i
(X1 − np01 )2 (X2 − np02 )2
= +
np01 np02
2
(X1 − np01 ) ([n − X1 ] − n[1 − p01 ])2
= +
np01 n[1 − p01 ]
2
(X1 − np01 ) (X1 − np01 ])2
= +
np01 n[1 − p01 ]
à !
2 1 1
= (X1 − np01 ) +
np01 n[1 − p01 ]
(X1 − np01 )2
=
np01 [1 − p01 ]
2
X1 − np01
= q
np01 [1 − p01 ]
∼ approx χ21 = χ22−1 .
(2) The above example also shows why we use np0i in the denominator of χ2
rather than the variance np0i (1 − p0i ).
Theorem 7.9.2 Both χ2 and modified χ2 -statistics are asymptotically χ2k−1 . That is,
Proof. We shall prove the theorem using two very different methods.
where à !
√ X1 − np1 Xk − npk
Y = [X − EX]/[ EX] = √ , ..., √ .
np1 npk
So it suffices to show that Y is asymptotically multi-normal.
P P
First of all, the charateristic function (c.f.) of X = ni=1 ξi = ni=1 ξi is
³ ´n
MX (it) = Eeit1 X1 +...+itk Xk = p1 eit1 + ... + pk eitk .
127
√ ³ ´
Thus the c.f. of Y = [X − EX]/[ EX] = X√1 −np
np1
1
, ..., Xk −npk
√
npk
is
( Ã ! Ã !)
it.Y X1 − np1 Xk − npk
Ee = E exp it1 √ + ... + itk √
np1 npk
( )
√ √ t1 tk
= exp {− (it1 np1 + ... + itk npk )} E exp iX1 √ + ... + iXk √
np1 npk
( )
k
X √ t1 tk
= exp −i tj npj E exp iX1 √ + ... + iXk √
np1 npk
j=1
( )n
√ X k
√ X k
itj
= exp −i n tj pj pj exp √
j=1 j=1 npj
1 k−1
X
= − u2j + ...
2 j=1
where u = (u1 , ..., uk ) = (t1³, ..., tk )Ak×k =´ tAk×k and where A is an orthonormal matrix
√ √
with the last column being p1 , ..., pk .
Note that Z = Y A has cumulant generating function
τ τY τ τ 1 k−1
X
κZ (u) = log EeiuZ = log EeitAA = log EeitY = − u2 + ...
2 j=1 j
128
From this, we get
Z ∼approx Nk (0, ΣZ )
where
ΣZ = Ik−1 , 0
0, 0
That is,
Z1 , ..., Zk−1 ∼approx N (0, 1) ,
Zk ∼prob 0.
Thus, by Slutsky’s theorem, we have
Y12 + ... + Yk2 = Y Y τ = Y AAτ Y τ = (Y A)(Y A)τ = ZZ τ = Z12 + ... + Zk2 ∼asy χ2k−1
The second part follows from the fact X/n → p with probability 1.
where
−1/2 −1/2
D(p) = diag(p1 , ..., pk )
√ √
Zn (p) = n (X/n − p) = n (X1 /n − p1 , ..., Xn /n − pk )
Yn (p) = Zn (p)1×k D(p)k×k .
or equivalently
Wn ≡ Zn (p)Σ−1/2 ∼approx N (0, Ik )
where
p1 (1 − p1 ), −p1 p2 , ......, −p1 pk
−p2 p1 , p2 (1 − p2 ), ......, −p1 pk
Σ =
.
....... ...... ......, ......
−pk p1 , −pk p2 , ......, pk (1 − pk )
Note that Σ is not full rank since the sum of each row equals zero. However, we can still
find Σ1/2 even though this may not be unique.
Next we have
129
where
B = D(p)ΣD(p)
−1/2 −1/2
= diag(p1 , ..., pk )
p1 (1 − p1 ), −p1 p2 , ......, −p1 pk
−p2 p1 , p2 (1 − p2 ), ......, −p1 pk
×
....... ...... ......, ......
−pk p1 , −pk p2 , ......, pk (1 − pk )
−1/2 −1/2
×diag(p1 , ..., pk )
√ √
1 − p1 , − p1 p2 , ......, − p1 pk
√
− p p , √
1 − p2 , ......, − p2 pk
=
1 2
.... ...... ......, ......
√ √
− p1 pk , − p2 pk , ......, 1 − pk
√ √ √
p1 p1 , p1 p2 , ......, p1 pk
√p p , √p p , ......, √p p
= Ik −
1 2 2 2 2 k
.... ...... ......, ......
√ √ √
p1 pk , p2 pk , ......, pk pk
√
p1
√p √ √ √
2
= Ik − ( p1 , p2 , ......, pk )
....
√
pk
τ
= Ik − φ φ,
³√ √ √ ´
where φ = p1 , p2 , ......, pk
Now
kYn (p)k2 = Yn (p)Yn (p)τ = Zn (p0 )D(p0 )D(p0 ) [Zn (p0 )]τ
= Zn (p0 )Σ−1/2 Σ1/2 D(p0 )D(p0 )Σ1/2 Σ−1/2 [Zn (p0 )]τ
³ ´³ ´³ ´τ
= Zn (p0 )Σ−1/2 Σ1/2 D(p0 )D(p0 )Σ1/2 Zn (p0 )Σ−1/2
= Wn AWnτ
∼ approx W AW τ ,
where W ∼ N (0, Ik ) and
A = Σ1/2 D(p0 )D(p0 )Σ1/2 .
First we claim that B is idempotent since
B 2 = (Ik − φτ φ)2 = Ik − 2φτ φ + φτ φφτ φ = Ik − 2φτ φ + φτ 1φ = Ik − φτ φ = B
Consequently, we see that A is also idempotent, since
A2 = Σ1/2 D(p0 )D(p0 )ΣD(p0 )D(p0 )D(p0 )Σ1/2 = Σ1/2 D(p0 )BD(p0 )Σ1/2
and
A3 = Σ1/2 D(p0 )BD(p0 )ΣD(p0 )D(p0 )Σ1/2 = Σ1/2 D(p0 )B 2 D(p0 )Σ1/2
= Σ1/2 D(p0 )BD(p0 )Σ1/2 = A2
That is, the engenvalues of A is either 1 or 0. So A is idempotent. (AX = λX = A2 X =
λAX = λ2 X, so λ2 − λ = 0, so λ = 1, or 0) And the rank of A is
rank(A) = trace[Σ1/2 D(p0 )D(p0 )Σ1/2 ] = trace[D(p0 )ΣD(p0 )]
130
= trace[B] = trace[Ik ] − trace[φτ φ] = k − trace[φφτ ] = k − 1
Therefore, there exists an orthogonal matrix Γ such that
à !
Ik−1 0
A=Γ Γτ .
0 0
Thus,
à !
Ik−1 0
W AW τ = W Γ Γτ W τ
0 0
à !
Ik−1 0
= V Vτ (where V = W Γ ∼ N (0, Ik ))
0 0
= V12 + ... + Vk−1
2
∼ χ2k−1 .
Remark 7.9.2 .
(a). A more general theorem is available, see Theorem 6.8. of Shao (1999,
p388).
(b). For more complete treatment of χ2 approximation, see Serfling.
H0 : F = F0 , versus H1 : F 6= F0 , (9.14)
where F0 is a known d.f.. For instance, F = N (0, 1). In elementary statistics, we know
that there are two ways to deal with this problem.
(i). Komogrov-Smirnov test, which is based on
131
We shall concentrate on the χ2 test in this section. To obtain the χ2 test, we partition
the range of X into k disjoint events A1 , ..., Ak and denote pj = PF (Aj ) and p0j = PF0 (Aj ),
j = 1, ..., k. For instance, if Yj ’s are univariate, then we can partition the whole line into
k intervals (−∞, a1 ], (a1 , a2 ], ..., (ak−1 , ∞], and denote
p1 = F (a1 ),
pj = F (aj ) − F (aj−1 ), j = 2, ..., k − 1,
pk = 1 − F (aj ).
Now instead of testing (9.14), we can test a less stringent test
H0 : pi = p0i versus H1 : pi 6= p0i , i = 1, ..., k.
or
H 0 : p = p0 versus H1 : p = p0 . (9.16)
132
Remark 7.9.3 We used that fact −2 log λ (X) ∼approx χ2k−s−1 . Here, let us see how we
figure out the degree of freedom from the general theorem of LRT.
(2). Under H0 : p = p(θ), where θ is s-dim, the total number of free parame-
ters under H0 is s.
(3). Therefore, according to the LRT theorem, the degree of freedom for the
asymptotic χ2 distribution is
Therefore,
k
X (Xi − np̂i )2
χ̃2 = = −2 log λ (X) + higher order term ∼approx χ2k−s−1 .
i=1 np̂i
133
7.10 Test of independence in contingency tables
A1 A2 Ac
B1 X11 X12 · · · X1c X1·
B2 X21 X22 · · · X2c X2·
. ··· ··· ···
Br Xr1 Xr2 · · · Xrc Xr·
X·1 X·2 · · · X·c X··
where
Ai ’s are disjoint events with ∪ci=1 Ai = Ω (the sample space),
Bi ’s are disjoint events with ∪ri=1 Bi = Ω,
Xij is the observed frequency of the outcomes in Ai ∩ Bj .
We wish to test whether the two attributes A and B are independent, i.e.
We are interested in finding its LRT, Wald’s and Rao’s tests, Person’s χ2 tests.
LRT
Recall that X = (X11 , ..., Xrc ) follow multinomial distribution with θ = (p11 , ..., prc ). So
n!
l(θ) = Pθ (X = x) = p11 X11 ...prc Xrc .
X11 !...Xrc !
It is easy to find (i.e., by Lagrange multiplier) that the MLE of θ is
Set
∂L(θ, λ)
= Xij /pij − λ = 0, (10.19)
∂pij
∂L(θ, λ) X
= pij − 1 = 0. (10.20)
∂λ i,j
134
P P
From (10.19), Xij /λ = pij , so that i,j pij = i,j Xij /λ = n/λ = 1. That is λ = n.
Therefore, the solutions are p̂ij = Xij /n, which completes the proof.
θ̂0 = (p̂1 , ..., p̂r , q̂1 , ..., q̂c ) = (X1. /n, ..., Xr. /n, X.1 /n, ..., X.c /n) .
Therefore,
³ ´³ ´
l(θ̂0 ) p̂X Xr.
1 ......p̂r
1.
q̂1X.1 ......q̂cX.c
λ(X) = =
l(θ̂) p̂X Xrc
11 ...p̂rc
11
à !X11 à !Xrc
p̂1 q̂1 p̂r q̂c
= .....
p̂11 p̂rc
µ ¶ µ ¶Xrc
X1. X.1 X11 Xr. X.c
= .....
nX11 nXrc
r Y c
à ! X ij
Y Xi. X.j
= .
i=1 j=1 nXij
135
r X
X c µ ¶
∆ij
= 2 np̂i. q̂j [1 + ∆ij ] ∆ij 1− + higher order term
i=1 j=1 2
Xr X c · ¸
1
= 2 np̂i. q̂j ∆ij 1 + ∆ij + higher order term
i=1 j=1 2
Xr X c r X
X c
= 2 np̂i. q̂j ∆ij + np̂i. q̂j ∆2ij + higher order term
i=1 j=1 i=1 j=1
c
r X r X
c
X X (Xij − np̂i. q̂j )2
= 2 (Xij − np̂i. q̂j ) + + higher order term
i=1 j=1 i=1 j=1 np̂i. q̂j
r X
c r X
c r X
c
X X X (Xij − np̂i. q̂j )2
= 2 Xij − 2n−1 Xi. X.j + + higher order term
i=1 j=1 i=1 j=1 i=1 j=1 np̂i. q̂j
à r ! c
r X
c
−1
X X X (Xij − np̂i. q̂j )2
= 2 n − n Xi. X.j + + higher order term
i=1 j=1 i=1 j=1 np̂i. q̂j
r X c
X (Xij − np̂i. q̂j )2
= 2 [n − n] + + higher order term
i=1 j=1 np̂i. q̂j
r X
c
X (Xij − np̂i. q̂j )2
= + higher order term
i=1 j=1 np̂i. q̂j
2
∼ approx χ(r−1)(c−1) .
k ≡ (rc − 1).
(2). Under H0 : pij = pi qj for all i and j, probability pij in each cell is a
function of the marginal probability. So the total number of free parameters
under H0 is
(3). Therefore, according to the LRT theorem, the degree of freedom for the
asymptotic χ2 distribuiton is
(Here, there are two different r’s. One is from LRT theorem, the other is from
the contigincy table.)
136
7.11 Some general comments
Remark 7.11.1 .
7.12 Exercises
1. Define P √ √
X nX̄ nX̄
W = qP = qP , T =
Xi2 Xi2 /n S
P
where S 2 = (Xi − X̄)2 /(n − 1). Show that the following identity holds
µ ¶1/2
n−1 W
T = q .
n 1 − W 2 /n
3. Let X1 , ..., Xm and Y1 , ..., Yn be two independent samples. Suppose that X’s and
Y ’s have common p.d.f.’s
1 −x/λ1
f1 (x) = e I{x > 0},
λ1
and
1 −y/λ2
f2 (y) = e I{y > 0},
λ2
respectively. We wish to test H0 : λ1 = λ2 versus H1 : λ1 6= λ2 .
137
4. Let X1 , ..., Xn be a random sample with p.d.f.
1
f (x) = e−x/θ I{x > 0}.
θ
Find the LRT of
5. Let X1 , ..., Xn be a random sample from a normal distribution with unknown pa-
rameters µ and σ 2 . Find the LRT of
6. Let U1 /σ12 ∼ χ2d1 , and U2 /σ22 ∼ χ2d2 and they are independent. Suppose that σ22 /σ12 =
a. Show that U2 /U1 and aU1 + U2 are independent. In particular, if σ1 = σ2 , U2 /U1
and U1 + U2 are independent.
(Hint: Either use Basu’s theorem or transformation method.)
θν θ
f (x) = I{x ≥ ν}
xθ+1
where θ > 0 and ν > 0.
H0 : θ = 1 versus H1 : θ 6= 1
has critical region of the form {x : T (x) < C1 or T (x) > C2 }, where
0 < C1 < C2 and " Qn #
i=1 Xi
T = log
(X(1) )n
(c). Show that, under H0 , 2T has a chi squared distribution (exact, not
approximate), and find the number of degrees of freedom. (Hint: Obtain
the joint distribution of the n − 1 nontrivial terms Xi /X(1) conditional on
X(1) . Put these n−1 terms together, and notice that the distribution of T
given X(1) does not depend on X(1) , so it is the unconditional distribution
of T .)
8. We have already seen the usefulness of the LRT in dealing with problems with
nuisance parameters. We now look at some other nuisance parameter problems.
138
(a). Find the LRT’s of
H0 : θ ≤ 0 versus H1 : θ > 0
H0 : γ = 1 versus H1 : γ 6= 1
9. A special case of a normal family is one in which the mean and the variance are re-
lated, the N (θ, aθ) family. If we are interested in testing this relationship, regardless
of the value of θ, we are again faced with a nuisance parameter problem.
10. Suppose that X1 , . . . , Xn are iid with a beta(µ, 1) pdf and Y1 , ..., Ym are iid with a
beta(θ, 1) pdf. Assume that X’s and Y ’s are independent.
(iii). Find the distribution of T when H0 is true, and show how to get a
test of size α = 0.1.
139